The goal of this Notebook is to take the parts of NovemberPhyloProc.ipynb that I expect to go into the actual manuscript.

Figure Outline

3/20/2017 Re-ran whole pipeline end-to-end. Hits on everything (including igg gp41 0 day) except only trending on gp41 Month 6.5. However I don’t see family level groups for gp41 that relate to community structure (q < 0.2, p < 0.05). I may just try running everything end to end a few more times to see what happens. This because I want to see how consistant (or otherwise) the results are between runs. Also, I’m going to start setting seeds now.

Loading libraries, functions and data

Libraries

# only use library paths in the anaconda environment

#.libPaths(grep('anaconda3', .libPaths(), value = T))
.libPaths()
## [1] "/data/users/cram/Project/Nyvac_096_Microbiome/packrat/lib/x86_64-pc-linux-gnu/3.6.0"
## [2] "/data/users/cram/Project/Nyvac_096_Microbiome/packrat/lib-ext"                      
## [3] "/data/users/cram/Project/Nyvac_096_Microbiome/packrat/lib-R"
R.version
##                _                                       
## platform       x86_64-pc-linux-gnu                     
## arch           x86_64                                  
## os             linux-gnu                               
## system         x86_64, linux-gnu                       
## status         beta                                    
## major          3                                       
## minor          6.0                                     
## year           2019                                    
## month          04                                      
## day            11                                      
## svn rev        76379                                   
## language       R                                       
## version.string R version 3.6.0 beta (2019-04-11 r76379)
## nickname       Planting of a Tree
sessionInfo()
## R version 3.6.0 beta (2019-04-11 r76379)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.6.0  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2
##  [5] htmltools_0.3.6 tools_3.6.0     yaml_2.2.0      Rcpp_0.12.19   
##  [9] stringi_1.2.4   rmarkdown_1.10  knitr_1.20      stringr_1.3.1  
## [13] digest_0.6.18   packrat_0.4.8-1 evaluate_0.12
# https://stackoverflow.com/questions/46354826/have-a-function-that-calls-library-and-takes-either-a-package-or-its-name-as-inp


# Also return package version when loading in packages
# accept strings or functions
libver <- function(pac){

    pac <- as.character(substitute(pac))
    library(pac, character.only=TRUE)
    packageVersion(pac)
    }
#libver("dada2")
#libver("ggplot2")
libver("Cairo")
## [1] '1.5.9'
# Much of the data handling
libver('phyloseq')
## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang
## Registered S3 method overwritten by 'dplyr':
##   method               from  
##   as.data.frame.tbl_df tibble
## [1] '1.22.3'
# A bunch of environments, including ggplot, dplyr, tidyr, and broom, which I use a lot
libver('tidyverse')
## Registered S3 method overwritten by 'rvest':
##   method            from
##   read_xml.response xml2
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.7
## ✔ tidyr   0.8.2     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## [1] '1.2.1'
# Mostly for concatenating ggplots
library(gridExtra); packageVersion("gridExtra")
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
## [1] '2.3'
# I use this surprisingly not a lot here.
library(vegan); packageVersion("vegan")
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.5-3
## [1] '2.5.3'
# For making trees
# libver('phangorn')
# A prerequesite to phangorn
# libver("DECIPHER")
# Some pre-processing stuff
# libver("dada2")
# I usually reshape with tidyverse tools now, but melt and cast are often easier in a pinch
# libver("reshape2")
# For replacing NaNs without too much thought.
# libver("imputeMissings")
# Deal with proportional data, especially useful for calculating proportionality phi
libver('compositions')
## Loading required package: tensorA
## 
## Attaching package: 'tensorA'
## The following object is masked from 'package:base':
## 
##     norm
## Loading required package: robustbase
## Loading required package: energy
## Loading required package: bayesm
## Welcome to compositions, a package for compositional data analysis.
## Find an intro with "? compositions"
## 
## Attaching package: 'compositions'
## The following objects are masked from 'package:stats':
## 
##     cor, cov, dist, var
## The following objects are masked from 'package:base':
## 
##     %*%, scale, scale.default
## [1] '1.40.2'
# Works with tidyverse to make model output tidy
libver('broom')
## [1] '0.5.0'
# Make pretty tables
libver('knitr')
## [1] '1.20'
libver('kableExtra')
## [1] '0.9.0'
# Let those pretty tables actually show up in a jupyter notebook
#library('IRdisplay')
# For bootstrapping
libver('boot')
## 
## Attaching package: 'boot'
## The following object is masked from 'package:robustbase':
## 
##     salinity
## The following object is masked from 'package:lattice':
## 
##     melanoma
## [1] '1.3.20'
# Calculate kernel regressions
libver("MiRKAT")
## Loading required package: survival
## 
## Attaching package: 'survival'
## The following object is masked from 'package:boot':
## 
##     aml
## The following object is masked from 'package:robustbase':
## 
##     heart
## Loading required package: PearsonDS
## Loading required package: GUniFrac
## Loading required package: ape
## 
## Attaching package: 'ape'
## The following object is masked from 'package:compositions':
## 
##     balance
## Loading required package: matrixStats
## 
## Attaching package: 'matrixStats'
## The following objects are masked from 'package:robustbase':
## 
##     colMedians, rowMedians
## The following object is masked from 'package:dplyr':
## 
##     count
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
## [1] '1.0.1'
libver("car")
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:boot':
## 
##     logit
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some
## [1] '3.0.2'
#libver(mclust)
#libver(chemometrics)
libver(purrrlyr)
## [1] '0.0.3'
libver('qvalue')
## [1] '2.10.0'
libver("breakaway")
## [1] '4.6.11'

Functions

I have put the functions in a library file

source('libraries/library096.R')

Data

# Set upOriginal to false, if you want to used user-reprocessed data.
# Results may differ slightly from those in the manuscript due to inter-run variation
# especially in the tree-ing algorithm.
 upOriginal <- TRUE
# upOriginal <- FALSE
# For permutation tests, how fast do things need to run
# 9999 for most runs, 99999 for publication quality ones suggested
jnperm <- 99999
# Data paths
getwd()
## [1] "/data/users/cram/Project/Nyvac_096_Microbiome"
(mapping_file_path <- file.path('data', 'mapping_file_096a.csv'))
## [1] "data/mapping_file_096a.csv"
(immune_file_path <- file.path('data', 'immune096b.csv'))
## [1] "data/immune096b.csv"
if(upOriginal){
     seqtab_file_path <- file.path('data', 'seqtab.nochimNov2017.csv')
     taxa_file_path <- file.path('data', 'TaxaNov2017.csv')
     tree_path <- file.path('data', 'phylogeny096NovTree.tre')
    } else {
     seqtab_file_path <- file.path('data1', 'seqtab.nochimMar2018.csv')
     taxa_file_path <- file.path('data1', 'TaxaMar2018.csv')
     tree_path <- file.path('data1', 'phylogeny096Mar2018tre.tre')
}

seqtab_file_path
## [1] "data/seqtab.nochimNov2017.csv"
taxa_file_path
## [1] "data/TaxaNov2017.csv"
tree_path
## [1] "data/phylogeny096NovTree.tre"
# Sequence data
seqtab.nochim.data <- read.csv(seqtab_file_path)

seqtabNames = gsub('\\.', '-',
    gsub('.fastq', '', seqtab.nochim.data$X)
                   )

seqtab.nochim = as.matrix(seqtab.nochim.data[,-1])
rownames(seqtab.nochim) = seqtabNames
# Taxa names
taxa.data <- read.csv(taxa_file_path)
taxa = taxa.data[,-1]

## I reverse complemented the sequences to generate the taxonomy
# (but only in this latest re-run, not the original)
## The following undoes that reverse complement to get original sequence
#rownames(taxa) = dada2:::rc(taxa.data[,1]) 

if(upOriginal){
    rownames(taxa) = (taxa.data[,1])} else {
    rownames(taxa) = dada2:::rc(taxa.data[,1]) 
}

taxa <- as.matrix(taxa)
# Mapping file
mapping.data <- read_csv(mapping_file_path) %>%
mutate(pub_id = sapply(pub_id,  function(x) {as.numeric(gsub("096-", "", x))}))
## Parsed with column specification:
## cols(
##   SampleID = col_character(),
##   BarcodeSequence = col_character(),
##   LinkerPrimerSequence = col_character(),
##   ReversePrimer = col_character(),
##   run_prefix = col_character(),
##   pub_id = col_character(),
##   Sex = col_character(),
##   Visit = col_integer(),
##   visitRank = col_integer(),
##   RXCode = col_character(),
##   Description = col_character()
## )
## Warning: The `printer` argument is deprecated as of rlang 0.3.0.
## This warning is displayed once per session.
#mapping = mapping.data[,-1]
#rownames(mapping) = mapping.data[,1]
#mapping <- as.matrix(mapping)
head(mapping.data)
## # A tibble: 6 x 11
##   SampleID BarcodeSequence LinkerPrimerSeq… ReversePrimer run_prefix pub_id
##   <chr>    <chr>           <chr>            <chr>         <chr>       <dbl>
## 1 Sample-… TTACGC          GCGGACTACCVGGGT… GCACTCCTRCGG… JH3VWZ201     282
## 2 Sample-… TTAGGT          GCGGACTACCVGGGT… GCACTCCTRCGG… JH3VWZ201     282
## 3 Sample-… TTCCAC          GCGGACTACCVGGGT… GCACTCCTRCGG… JH3VWZ201     176
## 4 Sample-… TTGTAC          GCGGACTACCVGGGT… GCACTCCTRCGG… JH3VWZ201     176
## 5 Sample-… TTGTGT          GCGGACTACCVGGGT… GCACTCCTRCGG… JH3VWZ201     123
## 6 Sample-… TATCAC          GCGGACTACCVGGGT… GCACTCCTRCGG… JH3VWZ201      49
## # ... with 5 more variables: Sex <chr>, Visit <int>, visitRank <int>,
## #   RXCode <chr>, Description <chr>
# Immune Data
immune.data0 <- read_csv(immune_file_path)
## Parsed with column specification:
## cols(
##   visitno = col_integer(),
##   rx_code = col_character(),
##   type = col_character(),
##   antigen = col_character(),
##   mag = col_double(),
##   mag_bl = col_double(),
##   response = col_integer(),
##   day = col_integer(),
##   month = col_double(),
##   ct = col_character(),
##   response_j = col_double(),
##   assay = col_character(),
##   pub_id = col_character()
## )
immune.data <- mutate(immune.data0, pub_id = sapply(pub_id,  function(x) {as.numeric(gsub("096-", "", x))}))
levels(immune.data$antigen) <- gsub("[ /]", ".", levels(immune.data$antigen))
head(immune.data)
## # A tibble: 6 x 13
##   visitno rx_code type  antigen   mag mag_bl response   day month ct   
##     <int> <chr>   <chr> <chr>   <dbl>  <dbl>    <int> <int> <dbl> <chr>
## 1       5 CTRL    IgA   gp41     180.   214.        0    42   1.5 CTRL 
## 2       7 CTRL    IgA   gp41     175.   214.        0    98   3.5 CTRL 
## 3       9 CTRL    IgA   gp41     177.   214.        0   182   6.5 CTRL 
## 4      12 CTRL    IgA   gp41     173    214.        0   364  12   CTRL 
## 5       5 CTRL    IgA   p24      447.   454.        0    42   1.5 CTRL 
## 6       7 CTRL    IgA   p24      767.   454.        0    98   3.5 CTRL 
## # ... with 3 more variables: response_j <dbl>, assay <chr>, pub_id <dbl>
# Phylogenetic tree
seqs <- dada2::getSequences(seqtab.nochim)
names(seqs) <- seqs

pt <- ape::read.tree(file=tree_path)

pt2 <- phangorn::midpoint(pt)
immune.data$antigen %>% unique
## [1] "gp41"               "p24"                "Con.6.gp120.B"     
## [4] "ZM96.gp140"         "gp70_B.CaseA_V1_V2" "ANY.ENV.PTEG"

Save options to a variable

par0 <- options()

Pre-processing

## minimal sample identification data
pub_id_key <- unique(immune.data[,c("pub_id", "rx_code", "ct")])

sample_sm0 <- dplyr::select(mapping.data, SampleID, pub_id, sex = Sex, muVisit = Visit, muVisitRank = visitRank)
sample_sm <- left_join(sample_sm0, pub_id_key, by = "pub_id") %>%
as.data.frame %>%
tibble::column_to_rownames(var = "SampleID")
## Warning: `chr_along()` is deprecated as of rlang 0.2.0.
## This warning is displayed once per session.
## Warning: Column `pub_id` has different attributes on LHS and RHS of join
# rownames(sample_sm)
# head(sample_sm)
# Make raw phyloseq object
ot <- otu_table(seqtab.nochim, taxa_are_rows=FALSE)

tt <- tax_table(taxa)
dimnames(tt) = dimnames(taxa)

spl <- sample_data(sample_sm)

psN is a really raw phyloseq object * OTU names are given as accession numbers * Numbers are in total counts * We have samples from both time points

# Quite raw phyloseq object. Species names are given as accession numbers
psN <- phyloseq(ot, tt, spl, pt2)

psN
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 960 taxa and 47 samples ]
## sample_data() Sample Data:       [ 47 samples by 6 sample variables ]
## tax_table()   Taxonomy Table:    [ 960 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 960 tips and 959 internal nodes ]

I want to make a phyloseq object for use in essentally all of the subsequent analyses. Features include: * Some basic taxonomic pre-processing. * No uncharacterized phyla. * Only OTUs that show up at least 10% of the time in the final data set . * Do this after filtering samples. * No tip-glomming. I’ll save that untill later. * Immune data is included in the sample data table. * We’ll do Andrew’s representitive IgGs and IgAs. * We only have samples from visit 1. * We only have samples from experemental (not control) groups.

immune.data %>% pull(type) %>% unique
## [1] "IgA"  "IgG"  "CD4+"
#immune.data %>% unite(type_antigen ,type, antigen, sep = "_")
immune.data %>% dplyr::select(pub_id, month, type, antigen, mag) %>% 
filter(month %in% c(0, 6.5, 12)) %>%
unite(type_antigen, type, antigen, sep = "_") %>%
unite(type_antigen_month,type_antigen, month, sep = "_Month_") %>%
spread(key = type_antigen_month, value = mag, drop = TRUE) -> immune.table
immune.table %>% head
## # A tibble: 6 x 25
##   pub_id `CD4+_ANY.ENV.P… `CD4+_ANY.ENV.P… `CD4+_ANY.ENV.P…
##    <dbl>            <dbl>            <dbl>            <dbl>
## 1      3            0.025           NA               0.0678
## 2      4            0.025           NA              NA     
## 3      5            0.025            0.025           0.025 
## 4      7            0.025            0.025           0.025 
## 5     12            0.025            0.025           0.0515
## 6     14            0.025            0.025           0.302 
## # ... with 21 more variables: IgA_gp41_Month_0 <dbl>,
## #   IgA_gp41_Month_12 <dbl>, IgA_gp41_Month_6.5 <dbl>,
## #   IgA_p24_Month_0 <dbl>, IgA_p24_Month_12 <dbl>,
## #   IgA_p24_Month_6.5 <dbl>, IgG_Con.6.gp120.B_Month_0 <dbl>,
## #   IgG_Con.6.gp120.B_Month_12 <dbl>, IgG_Con.6.gp120.B_Month_6.5 <dbl>,
## #   IgG_gp41_Month_0 <dbl>, IgG_gp41_Month_12 <dbl>,
## #   IgG_gp41_Month_6.5 <dbl>, IgG_gp70_B.CaseA_V1_V2_Month_0 <dbl>,
## #   IgG_gp70_B.CaseA_V1_V2_Month_12 <dbl>,
## #   IgG_gp70_B.CaseA_V1_V2_Month_6.5 <dbl>, IgG_p24_Month_0 <dbl>,
## #   IgG_p24_Month_12 <dbl>, IgG_p24_Month_6.5 <dbl>,
## #   IgG_ZM96.gp140_Month_0 <dbl>, IgG_ZM96.gp140_Month_12 <dbl>,
## #   IgG_ZM96.gp140_Month_6.5 <dbl>

Initial Taxonomic filter.

Some investegation suggested by the phyloseq tutorials to identify phyla for removal, and to identify an abundance threshold

psN
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 960 taxa and 47 samples ]
## sample_data() Sample Data:       [ 47 samples by 6 sample variables ]
## tax_table()   Taxonomy Table:    [ 960 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 960 tips and 959 internal nodes ]
psN %>% subset_samples(!is.na(pub_id))
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 960 taxa and 38 samples ]
## sample_data() Sample Data:       [ 38 samples by 6 sample variables ]
## tax_table()   Taxonomy Table:    [ 960 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 960 tips and 959 internal nodes ]
# skip the blanks
psN %>% subset_samples(!is.na(pub_id)) %>%
# OTUs must be characterized to phylum
subset_taxa(!is.na(Phylum)& !Phylum %in% c("", "uncharacterized")) -> psN_hasPhylum
psN_hasPhylum
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 929 taxa and 38 samples ]
## sample_data() Sample Data:       [ 38 samples by 6 sample variables ]
## tax_table()   Taxonomy Table:    [ 929 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 929 tips and 928 internal nodes ]
# from 960 to 929 otus

Identifying and removing phyla with very few taxa in them

prevdf = apply(X = otu_table(psN_hasPhylum),
                 MARGIN = ifelse(taxa_are_rows(psN_hasPhylum), yes = 1, no = 2),
                 FUN = function(x){sum(x > 0)})
# Add taxonomy and total read counts to this data.frame
prevdf = data.frame(Prevalence = prevdf,
                      TotalAbundance = taxa_sums(psN_hasPhylum),
                      tax_table(psN_hasPhylum))

plyr::ddply(prevdf, "Phylum", function(df1){cbind(mean(df1$Prevalence),sum(df1$Prevalence))})
##            Phylum        1    2
## 1  Actinobacteria 7.953846  517
## 2   Bacteroidetes 9.414634 1930
## 3   Elusimicrobia 2.000000    2
## 4      Firmicutes 9.711604 5691
## 5    Fusobacteria 5.875000   94
## 6  Proteobacteria 6.960000  348
## 7   Synergistetes 3.750000   15
## 8     Tenericutes 4.000000    4
## 9 Verrucomicrobia 6.000000    6
filterPhyla = c("Verrucomicrobia", "Tenericutes", "Elusimicrobia", "Synergistetes")
psN_MainPhyla = subset_taxa(psN_hasPhylum, !Phylum %in% filterPhyla)
psN_MainPhyla
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 922 taxa and 38 samples ]
## sample_data() Sample Data:       [ 38 samples by 6 sample variables ]
## tax_table()   Taxonomy Table:    [ 922 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 922 tips and 921 internal nodes ]
# Determining abundance threshold
prevdf1 = subset(prevdf, Phylum %in% get_taxa_unique(psN_MainPhyla, "Phylum"))
ggplot(prevdf1, aes(TotalAbundance, Prevalence / nsamples(psN_hasPhylum),color=Phylum)) +
  # Include a guess for parameter
  geom_hline(yintercept = 0.05, alpha = 0.5, linetype = 2) + geom_point(size = 2, alpha = 0.7) +
  scale_x_log10() +  xlab("Total Abundance") + ylab("Prevalence [Frac. Samples]") +
  facet_wrap(~Phylum) + theme(legend.position="none")
## Warning: Transformation introduced infinite values in continuous x-axis

# Determining abundance threshold
prevdf1 = subset(prevdf, Phylum %in% get_taxa_unique(psN_MainPhyla, "Phylum"))
ggplot(prevdf1, aes(TotalAbundance, Prevalence / nsamples(psN_hasPhylum),color=Phylum)) +
  # Include a guess for parameter
  geom_hline(yintercept = 0.05, alpha = 0.5, linetype = 2) + geom_point(size = 2, alpha = 0.7) +
  scale_x_continuous() +  xlab("Total Abundance") + ylab("Prevalence [Frac. Samples]") +
  facet_wrap(~Phylum) + theme(legend.position="none")

Are differences between participants greater than differences within participants accross time-points?

Make a phyloseq object like psN2 but with all participants, psN2A Code copied from above.

Constructing psN2

(phyloseq object of relative abundances)

And psN1 (phyloseq object of counts)

psN1A and psN2A include all participants, and will be used to look at variability within participants

psN %>%
# add all the immune data
phylo_join(immune.table, by = "pub_id") %>%
# only use data from humans (no extraction controls)
subset_samples(is.finite(muVisitRank)) %>%
# only otus from known taxa that show up frequently enough
subset_taxa(!is.na(Phylum)& !Phylum %in% c("", "uncharacterized")) %>%
subset_taxa(!Phylum %in% filterPhyla) %>%
# only otus that show up in at least 10% of samples
prevalence_filter_taxa %>%
# convert to relative abundance

tag_phyloseq%>%
# Instead of naming each taxon with its full sequence, we use the "tag" instead
swap.phyloseq.taxnames %>%
pass -> psN1A # Save pre relative abundance transformation
## Warning: `list_len()` is deprecated as of rlang 0.2.0.
## Please use `new_list()` instead.
## This warning is displayed once per session.
## Warning: Setting row names on a tibble is deprecated.
# add is-male
manColumn <- psN1A %>% sample_data %>% as('data.frame') %>% rownames_to_column  %>% mutate(isMale = testIsMaleVec(sex)) %>% dplyr::select(rowname, isMale)
psN1A <- phylo_join(psN1A, manColumn, by = 'rowname')

## psN2 is like psN1 but with relative abundances
psN1A %>%
transform_sample_counts(function(x) {x/sum(x)}) %>%
# The "tag" is a new name that takes into account the rest of the taxonomy data
# the tag may need to be updated after any agglomeration
pass-> psN2A
# filter to just microbiome visit 1 and experemental treatments
psN1A %>%
subset_samples(muVisitRank == 1) %>%
subset_samples(ct == "T") %>%
pass -> psN1

psN2A %>%
subset_samples(muVisitRank == 1) %>%
subset_samples(ct == "T") %>%
pass -> psN2
# Calculate weighted unifrac distances and role those in.
psN2.wuf <- phyloseq::distance(psN2, method = "wunifrac")
psN2.pcoa <- capscale(psN2.wuf ~ 1)
psN2.pcoa.df <- psN2.pcoa %>% scores(display = "sites") %>%
        as.data.frame %>% 
        rownames_to_column %>% 
        dplyr::select('rowname', 'MDS1', 'MDS2') %>%
        mutate(rMDS1 = rank(MDS1)) %>% # rank order of MDS1
        mutate(rrMDS1 = formatC(format = "d", rMDS1, flag = "0", width=ceiling(log10(max(rMDS1))))) %>%
        unite(newname, rrMDS1, rowname, sep = "_", remove = FALSE) %>%
        dplyr::select(-rrMDS1)

psN2 %>%
phylo_join(
    psN2.pcoa.df,
    by = 'rowname'
) -> psN2

## Even if the data are counts, 
## the weighted unifrac pcoa is still done on the relative abundances
psN1 %>%
phylo_join(
    psN2.pcoa.df,
    by = 'rowname'
) -> psN1

psN2A
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 651 taxa and 38 samples ]
## sample_data() Sample Data:       [ 38 samples by 31 sample variables ]
## tax_table()   Taxonomy Table:    [ 651 taxa by 10 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 651 tips and 650 internal nodes ]
psN1A
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 651 taxa and 38 samples ]
## sample_data() Sample Data:       [ 38 samples by 31 sample variables ]
## tax_table()   Taxonomy Table:    [ 651 taxa by 10 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 651 tips and 650 internal nodes ]
psN2
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 651 taxa and 21 samples ]
## sample_data() Sample Data:       [ 21 samples by 35 sample variables ]
## tax_table()   Taxonomy Table:    [ 651 taxa by 10 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 651 tips and 650 internal nodes ]
psN1
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 651 taxa and 21 samples ]
## sample_data() Sample Data:       [ 21 samples by 35 sample variables ]
## tax_table()   Taxonomy Table:    [ 651 taxa by 10 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 651 tips and 650 internal nodes ]

Data Curation Post mortum

How many taxa were still present after each filtering step?

# Find number of taxa in available samples
psN %>%
# add all the immune data
phylo_join(immune.table, by = "pub_id") %>%
# filter to just microbiome visit 1 and experemental treatments
subset_samples(muVisitRank == 1) %>%
subset_samples(ct == "T") %>%
prevalence_filter_taxa(thresh = 0) %>%
pass-> psInSamples
(NInSamples <- dim(otu_table(psInSamples))[2])
## [1] 960
# Number of taxa with unidentified phyla
psInSamples %>%
subset_taxa(!is.na(Phylum)& !Phylum %in% c("", "uncharacterized")) %>%
pass -> psIdentifiedPhylum
(NUnkPhylum <- NInSamples - dim(otu_table(psIdentifiedPhylum))[2])
## [1] 31
filterPhyla
## [1] "Verrucomicrobia" "Tenericutes"     "Elusimicrobia"   "Synergistetes"
# Phyla removed because they are in filterPhyla 
# -- each of which show up fewer than 20 times in the data set
psIdentifiedPhylum %>%
subset_taxa(!Phylum %in% filterPhyla) %>%
pass -> psNotPhylaFiltered
(NFiltPhyla <- dim(otu_table(psIdentifiedPhylum))[2] - dim(otu_table(psNotPhylaFiltered))[2])
## [1] 7
# Taxa removed because there were in fewer than 10% of the samples
psNotPhylaFiltered %>%
prevalence_filter_taxa %>%
pass -> psPFT
dim(otu_table(psNotPhylaFiltered))[2] - dim(otu_table(psPFT))[2]
## [1] 386

Immune figure

How to participants’ immune profiles change over time?

# When were participants vaccinated?
# Copied from protocol apendix E
# visitno 1 is a screening visit, I assign it NaN
dayTable = data.frame(
    visitno = seq(from = 1, to = 14, by = 1),
    day = c(NaN, 0, 14, 28, 42, 84, 98, 168, 182, 196, 273, 364, 455, 545),
    month = c(NaN, 0, 0.5, 1, 1.5, 3, 3.5, 6, 6.5, 7, 9, 12, 15, 18)
)
vac <- data.frame(
    visitno = c(2, 4, 6, 8)
    )
vac <- left_join(vac, dayTable, by = 'visitno')

vac
##   visitno day month
## 1       2   0     0
## 2       4  28     1
## 3       6  84     3
## 4       8 168     6
# Representitive antigens for further considerations
# These are essentially zero (mag = 1) at baseline
ants1 <- c('Con.6.gp120.B', 'ZM96.gp140', 'gp70_B.CaseA_V1_V2')
# These have measurable baseline magnitudes
ants2 <- c('gp41', 'p24')
donor.immune <-  psN2 %>% sample_data %>% as('data.frame') %>% dplyr::select(pub_id) %>%
left_join(immune.data, by = 'pub_id')
## Warning: Column `pub_id` has different attributes on LHS and RHS of join
donor.immune %>% head
##   pub_id visitno rx_code type antigen    mag mag_bl response day month ct
## 1    282       5      T1  IgA    gp41 352.00  109.5        0  42   1.5  T
## 2    282       7      T1  IgA    gp41 276.50  109.5        0  98   3.5  T
## 3    282       9      T1  IgA    gp41 333.25  109.5        0 182   6.5  T
## 4    282      12      T1  IgA    gp41   1.00  109.5        0 364  12.0  T
## 5    282       5      T1  IgA     p24 313.50  329.8        0  42   1.5  T
## 6    282       7      T1  IgA     p24 377.50  329.8        0  98   3.5  T
##    response_j assay
## 1 -0.18339353  BAMA
## 2 -0.10360764  BAMA
## 3  0.19882813  BAMA
## 4  0.15338760  BAMA
## 5  0.02648181  BAMA
## 6 -0.10988643  BAMA
psN %>% sample_data %>%
as('data.frame') %>% 
filter(!is.na(pub_id)) %>%
pull(pub_id) %>%
unique %>%
pass -> microbiomeCohort
immune.data %>% filter(pub_id %in% microbiomeCohort) %>%
pass -> donor.immune

donor.immune %>% head
## # A tibble: 6 x 13
##   visitno rx_code type  antigen   mag mag_bl response   day month ct   
##     <int> <chr>   <chr> <chr>   <dbl>  <dbl>    <int> <int> <dbl> <chr>
## 1       5 T1      IgA   gp41     352    110.        0    42   1.5 T    
## 2       7 T1      IgA   gp41     276.   110.        0    98   3.5 T    
## 3       9 T1      IgA   gp41     333.   110.        0   182   6.5 T    
## 4      12 T1      IgA   gp41       1    110.        0   364  12   T    
## 5       5 T1      IgA   p24      314.   330.        0    42   1.5 T    
## 6       7 T1      IgA   p24      378.   330.        0    98   3.5 T    
## # ... with 3 more variables: response_j <dbl>, assay <chr>, pub_id <dbl>
options(par0)
iggplot <- immune.data %>%
mutate(inCohort = pub_id %in% microbiomeCohort) %>%
filter(type == 'IgG', antigen %in% c(ants1, ants2)) %>%
mutate(antigen = factor(antigen, levels = c(ants2, ants1))) %>% # reorder facets
ggplot(aes(x = month, y =mag, group = pub_id, colour = inCohort, alpha = inCohort)) +
geom_line() +
geom_point() +
geom_rug(data = vac, aes(x = month), inherit.aes = F, color = 'blue') +
facet_grid(antigen ~ rx_code, labeller = label_wrap_gen()) +
theme_bw() +
theme(strip.text.y = element_text(angle = 0),
      axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_y_log10(breaks = 10^(0:5)) +
scale_colour_manual(values = c("black", "red")) +
scale_alpha_manual(values = c(.6, 1)) + 
labs(y = "BAMA Response Magnitude")



ggsave('figures/useiggsAllParticipants.svg')
## Saving 7 x 5 in image
# To fix. Control groups don't show up in this version.
iggplot

iggplot <- immune.data %>%
mutate(inCohort = pub_id %in% microbiomeCohort) %>%
filter(type %in% c('IgA', 'CD4+') & antigen %in% c(ants2, 'ANY.ENV.PTEG'))%>%
mutate(antigen = factor(antigen, levels = c(ants2, 'ANY.ENV.PTEG'))) %>% # reorder facets
ggplot(aes(x = month, y =mag, group = pub_id, colour = inCohort, alpha = inCohort)) +
geom_line() +
geom_point() +
geom_rug(data = vac, aes(x = month), inherit.aes = F, color = 'blue') +
facet_grid(antigen ~ rx_code, labeller = label_wrap_gen(), scales = 'free_y') +
theme_bw() +
theme(strip.text.y = element_text(angle = 0),
      axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_y_log10(breaks = 10^c(
    seq(from = -2, to = 0, by = 0.25), seq(from = 0, to = 5, by = 1)
), labels = function(x) round(as.numeric(x), digits=3)) +
scale_colour_manual(values = c("black", "red")) +
scale_alpha_manual(values = c(.6, 1)) +
labs(y = "BAMA Response Magnitude")

iggplot

ggsave('figures/useIgACD4AllParticipants.svg')
## Saving 7 x 5 in image
# To fix. Control groups don't show up in this version.
iggplot <- donor.immune %>% filter(type == 'IgG', antigen %in% c(ants1, ants2)) %>%
mutate(antigen = factor(antigen, levels = c(ants2, ants1))) %>% # reorder facets
ggplot(aes(x = month, y =mag, group = pub_id)) + geom_point(alpha = 0.6) + geom_line(alpha = 0.4) +
geom_rug(data = vac, aes(x = month), inherit.aes = F, color = 'blue') +
facet_grid(antigen ~ rx_code, labeller = label_wrap_gen()) +
theme_bw() + theme(strip.text.y = element_text(angle = 0), axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_y_log10(breaks = 10^(0:5)) +
labs(y = "BAMA Response Magnitude")
iggplot

ggsave('figures/useiggs.svg') # I can no longer save pngs with transparency, going to svg
## Saving 7 x 5 in image
# To fix. Control groups don't show up in this version.

Magnitude and variance of vaccine response per group

colnames(immune.data)
##  [1] "visitno"    "rx_code"    "type"       "antigen"    "mag"       
##  [6] "mag_bl"     "response"   "day"        "month"      "ct"        
## [11] "response_j" "assay"      "pub_id"

Plan. Calculate mean and variance of each antigen-type at visit 9, and at baseline.

#
geomean <- function(x, na.rm = FALSE, trim = 0, ...)
{
exp(mean(log(x, ...), na.rm = na.rm, trim = trim, ...))
}
 
geosd <- function(x, na.rm = FALSE, ...)
{
exp(sd(log(x, ...), na.rm = na.rm, ...))
}
immune.data %>% dplyr::filter(visitno %in% c(9), ct == "T") %>% dplyr::select(pub_id, antigen, visitno, mag, mag_bl, type) %>% group_by(antigen, type) %>%
summarize(mean_mag = geomean(mag), mean_bl = geomean(mag_bl), sd_mag = geosd(mag), sd_bl = geosd(mag_bl)) %>%
mutate(var_over_mean_mag = sd_mag^2/mean_mag, var_over_mean_bl = sd_bl^2/mean_bl)
## # A tibble: 8 x 8
## # Groups:   antigen [6]
##   antigen type  mean_mag mean_bl sd_mag sd_bl var_over_mean_m…
##   <chr>   <chr>    <dbl>   <dbl>  <dbl> <dbl>            <dbl>
## 1 ANY.EN… CD4+   9.20e-2   NA      2.51 NA           68.7     
## 2 Con.6.… IgG    1.14e+4    2.54   8.33  4.64         0.00607 
## 3 gp41    IgA    9.87e+1   15.9    6.42  7.04         0.417   
## 4 gp41    IgG    5.56e+3  208.     3.76  4.91         0.00254 
## 5 gp70_B… IgG    1.98e+3    1.48   7.40  2.66         0.0277  
## 6 p24     IgA    2.50e+3  422.     4.02  2.48         0.00646 
## 7 p24     IgG    2.55e+4  152.     2.44  6.34         0.000233
## 8 ZM96.g… IgG    1.81e+3    8.83   7.07  9.01         0.0276  
## # ... with 1 more variable: var_over_mean_bl <dbl>

Mean variance and variance over mean of each value.

immune.data %>% dplyr::filter(visitno %in% c(9), ct == "T") %>% dplyr::select(pub_id,rx_code, antigen, visitno, mag, mag_bl, type) %>% group_by(rx_code, antigen, type) %>%
summarize(mean_mag = geomean(mag), mean_bl = geomean(mag_bl), sd_mag = geosd(mag), sd_bl = geosd(mag_bl)) %>%
mutate(var_over_mean_mag = sd_mag^2/mean_mag, var_over_mean_bl = sd_bl^2/mean_bl) %>% 
gather(key = "meas", value = "value", mean_mag, mean_bl, sd_mag, sd_bl, var_over_mean_mag, var_over_mean_bl) %>% 
group_by(antigen, type, meas) %>% summarize(mean_val = mean(value)) %>%
spread(key = "meas", value = "mean_val")
## # A tibble: 8 x 8
## # Groups:   antigen, type [8]
##   antigen type  mean_bl mean_mag sd_bl sd_mag var_over_mean_bl
##   <chr>   <chr>   <dbl>    <dbl> <dbl>  <dbl>            <dbl>
## 1 ANY.EN… CD4+    NA     9.76e-2 NA      2.37          NA     
## 2 Con.6.… IgG      2.60  1.21e+4  4.77   8.59           9.04  
## 3 gp41    IgA     17.4   1.06e+2  7.01   6.62           4.30  
## 4 gp41    IgG    215.    6.43e+3  4.94   3.51           0.135 
## 5 gp70_B… IgG      1.51  1.99e+3  2.61   7.93           4.64  
## 6 p24     IgA    428.    3.34e+3  2.51   3.38           0.0147
## 7 p24     IgG    155.    2.56e+4  6.94   2.44           0.466 
## 8 ZM96.g… IgG      8.85  1.98e+3  9.47   7.15          10.6   
## # ... with 1 more variable: var_over_mean_mag <dbl>

As above, but this time, calculated seperately for each treatment group and then those values averaged. ZM96 and gp70 surprisingly large variance over mean. Maybe should look at delta magnitude

immune.data%>% head
## # A tibble: 6 x 13
##   visitno rx_code type  antigen   mag mag_bl response   day month ct   
##     <int> <chr>   <chr> <chr>   <dbl>  <dbl>    <int> <int> <dbl> <chr>
## 1       5 CTRL    IgA   gp41     180.   214.        0    42   1.5 CTRL 
## 2       7 CTRL    IgA   gp41     175.   214.        0    98   3.5 CTRL 
## 3       9 CTRL    IgA   gp41     177.   214.        0   182   6.5 CTRL 
## 4      12 CTRL    IgA   gp41     173    214.        0   364  12   CTRL 
## 5       5 CTRL    IgA   p24      447.   454.        0    42   1.5 CTRL 
## 6       7 CTRL    IgA   p24      767.   454.        0    98   3.5 CTRL 
## # ... with 3 more variables: response_j <dbl>, assay <chr>, pub_id <dbl>
immune.data %>% 
mutate(mag_delta = mag / mag_bl) %>%
mutate(mag_delta = if_else(mag_delta < 1, 1, mag_delta)) %>%
dplyr::filter(visitno %in% c(9), ct == "T") %>% dplyr::select(pub_id,rx_code, antigen, visitno, mag, mag_bl, mag_delta, type) %>% group_by(rx_code, antigen, type) %>%
summarize(mean_mag = geomean(mag), mean_bl = geomean(mag_bl), mean_delta = geomean(mag_delta), sd_mag = geosd(mag), sd_bl = geosd(mag_bl), sd_delta = geosd(mag_delta)) %>%
mutate(var_over_mean_mag = sd_mag^2/mean_mag, var_over_mean_bl = sd_bl^2/mean_bl, var_over_mean_delta = sd_delta^2/mean_delta) %>% 
gather(key = "meas", value = "value", mean_mag, mean_bl, sd_mag, sd_bl, var_over_mean_mag, var_over_mean_bl, mean_delta, sd_delta, var_over_mean_delta) %>% 
group_by(antigen, type, meas) %>% summarize(mean_val = mean(value)) %>%
spread(key = "meas", value = "mean_val") %>%
arrange(type)
## # A tibble: 8 x 11
## # Groups:   antigen, type [8]
##   antigen type  mean_bl mean_delta mean_mag sd_bl sd_delta sd_mag
##   <chr>   <chr>   <dbl>      <dbl>    <dbl> <dbl>    <dbl>  <dbl>
## 1 ANY.EN… CD4+    NA         NA     9.76e-2 NA       NA      2.37
## 2 gp41    IgA     17.4        8.97  1.06e+2  7.01     5.63   6.62
## 3 p24     IgA    428.         7.27  3.34e+3  2.51     3.89   3.38
## 4 Con.6.… IgG      2.60    5310.    1.21e+4  4.77    13.6    8.59
## 5 gp41    IgG    215.        35.8   6.43e+3  4.94     6.98   3.51
## 6 gp70_B… IgG      1.51    1417.    1.99e+3  2.61     9.60   7.93
## 7 p24     IgG    155.       176.    2.56e+4  6.94     7.87   2.44
## 8 ZM96.g… IgG      8.85     239.    1.98e+3  9.47    13.9    7.15
## # ... with 3 more variables: var_over_mean_bl <dbl>,
## #   var_over_mean_delta <dbl>, var_over_mean_mag <dbl>

Number of participants per group

All participants

immune.data %>% 
group_by(rx_code) %>%
summarize(Unique_ids = n_distinct(pub_id))
## # A tibble: 5 x 2
##   rx_code Unique_ids
##   <chr>        <int>
## 1 CTRL            16
## 2 T1              20
## 3 T2              20
## 4 T3              20
## 5 T4              20

Participants with microbiome data

donor.immune %>% 
group_by(rx_code) %>%
summarize(Unique_ids = n_distinct(pub_id))
## # A tibble: 5 x 2
##   rx_code Unique_ids
##   <chr>        <int>
## 1 CTRL             2
## 2 T1               7
## 3 T2               3
## 4 T3               6
## 5 T4               5

Further breakdown of participants providing microbiota per group

Number of participants with a given day of first sample.

psN2 %>% sample_data %>% data.frame %>% group_by(muVisit) %>% summarize(n = length(muVisit))
## # A tibble: 3 x 2
##   muVisit     n
##     <int> <int>
## 1       2     3
## 2       9    11
## 3      12     7

How many donors were there from each treatment?

psN2 %>% sample_data %>% data.frame %>% group_by(rx_code) %>% summarize(n = length(muVisit))
## # A tibble: 4 x 2
##   rx_code     n
##   <chr>   <int>
## 1 T1          7
## 2 T2          3
## 3 T3          6
## 4 T4          5

Breakdown by both visit and treatment

sampleBreakdown <- psN2 %>% sample_data %>% data.frame %>% group_by(muVisit, rx_code) %>% summarize(n = length(muVisit)) %>% spread(key = rx_code, value = n, fill = 0) %>% mutate(total = T1 + T2 + T3 + T4)

sbcs <- colSums(sampleBreakdown)

sampleBreakdown <- bind_rows(sampleBreakdown, sbcs)

sampleBreakdown %>%
kable("html", escape = F, digits = 3, align = 'c') %>%
#kable_styling("striped", "hover", full_width = F) %>%
#collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass-> sampleBreakdown.html
sampleBreakdown.html
muVisit T1 T2 T3 T4 total
2 3 0 0 0 3
9 4 2 5 0 11
12 0 1 1 5 7
23 7 3 6 5 21
sampleBreakdown.html %>% cat(file = 'tables/sampleBreakdown.html')

Weighted Unifrac

Variability within vs between participants

psN2A.wuf <- phyloseq::distance(psN2A, method = "wunifrac")
psN2A %>% sample_data %>% .[1:5, 1:10]
##           pub_id    sex muVisit muVisitRank rx_code ct
## Sample-57    282   Male       9           1      T1  T
## Sample-58    282   Male      12           2      T1  T
## Sample-60    176 Female       9           1      T1  T
## Sample-61    176 Female      12           2      T1  T
## Sample-62    123 Female      12           1      T4  T
##           CD4._ANY.ENV.PTEG_Month_0 CD4._ANY.ENV.PTEG_Month_12
## Sample-57                     0.025                 0.02500000
## Sample-58                     0.025                 0.02500000
## Sample-60                     0.025                 0.06841053
## Sample-61                     0.025                 0.06841053
## Sample-62                     0.025                 0.42710324
##           CD4._ANY.ENV.PTEG_Month_6.5 IgA_gp41_Month_0
## Sample-57                  0.06146979            109.5
## Sample-58                  0.06146979            109.5
## Sample-60                  0.16678641            149.5
## Sample-61                  0.16678641            149.5
## Sample-62                  0.70855159             33.5
psN2A %>% sample_data %>% as("data.frame") %>% rownames_to_column(var = "Sample") %>% dplyr::select(Sample, pub_id) %>%
pass -> S2P
All.equal <- Vectorize(function(x,y){x == y})
## Convert distance matrix into long form matrix
psN2A.wuf %>% as.matrix %>% as.data.frame %>%
rownames_to_column(var = "SampleX")%>%
gather(key = "SampleY", value = "WufDist", -SampleX) %>%
left_join(S2P, by = c("SampleX" = "Sample")) %>% rename(pub_id_x = pub_id) %>%
left_join(S2P, by = c("SampleY" = "Sample")) %>% rename(pub_id_y = pub_id) %>%
mutate(SampleX = as.numeric(str_extract(SampleX, "[0-9][0-9]"))) %>%
mutate(SampleY = as.numeric(str_extract(SampleY, "[0-9][0-9]"))) %>%
## discard diagonal discard and upper half of triangular matrix
filter(SampleX < SampleY) %>%
mutate(isSamePerson = All.equal(pub_id_x, pub_id_y)) %>%
# ## discard cases where pub_id is unknown
# filter(is.finite(pub_id_x) & is.finite(pub_id_y))
pass -> AllWufDist

AllWufDist %>% head
##   SampleX SampleY   WufDist pub_id_x pub_id_y isSamePerson
## 1      57      58 0.3769328      282      282         TRUE
## 2      57      60 0.2754723      282      176        FALSE
## 3      58      60 0.3852072      282      176        FALSE
## 4      57      61 0.4105227      282      176        FALSE
## 5      58      61 0.3384203      282      176        FALSE
## 6      60      61 0.3868663      176      176         TRUE
#AllWufDist %>% ggplot(aes(x = isSamePerson, y = WufDist)) + geom_violin() + geom_dotplot(binaxis = "y", stackdir = "center", binwidth = .005)

Get Mean values for between participant and within participant weighted unifrac distances.

WufMeans <- AllWufDist %>% group_by(isSamePerson) %>% summarize(mean = mean(WufDist))
WufMeans
## # A tibble: 2 x 2
##   isSamePerson  mean
##   <lgl>        <dbl>
## 1 FALSE        0.497
## 2 TRUE         0.411
#SameAndDiff <- data.frame(comparison = c("different", "same"), WufMeans$mean)
SameAndDiff <- WufMeans %>% spread(key = isSamePerson, value = mean) %>% rename(between = 'FALSE', within = 'TRUE') %>% mutate(diff = between-within)
SameAndDiff
## # A tibble: 1 x 3
##   between within   diff
##     <dbl>  <dbl>  <dbl>
## 1   0.497  0.411 0.0860
#SameAndDiff[1,2] - SameAndDiff[2,2] # Difference in weighted unifrac dissimilarity between same and different partcipants

Bootstrapped confidence intervals

Bootstrap some confidence intervals of within and between participant weighted unifrac distances.

# Split the data
SamePersonWufDist <- AllWufDist %>% filter(isSamePerson) #%>% dplyr::select(WufDist)
DifferentPersonWufDist <- AllWufDist %>% filter(!isSamePerson)
set.seed(334)

bootsSame <- rsample::bootstraps(SamePersonWufDist, times = 10000)
bootsDifferent <- rsample::bootstraps(DifferentPersonWufDist, times = 10000)
mean_of_bootstrap <- function(split){
    locVals <- rsample::analysis(split)$WufDist
    mean(locVals)
}
boot_meansSame <- bootsSame %>% mutate(mean = map_dbl(splits, mean_of_bootstrap)) %>% dplyr::select(mean)

boot_meansDifferent <- bootsDifferent %>% mutate(mean = map_dbl(splits, mean_of_bootstrap)) %>% dplyr::select(mean)

boot_means <- bind_cols(within = boot_meansSame$mean, between = boot_meansDifferent$mean) %>%
mutate(isDifferentBigger = between>within, 
      DifMinusSame = between - within)
#boot_means

1- sum(boot_means$isDifferentBigger)/length(boot_means$isDifferentBigger)
## [1] 0.012

The above is a bootstrapped P value that the two are different from eachother. Per a conversation I had with Klaus Hubert. Still need to find a justification that this approach is legit.

boot_means %>% ggplot(aes(x = DifMinusSame)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

A histogram of bootstrapped differences between within participant and between participant mean values.

quantile(boot_means$DifMinusSame, c(0.025, 0.5, 0.975))
##       2.5%        50%      97.5% 
## 0.01273644 0.08721308 0.15521066

95% confidence intervals of the differences between same and different person microbiota.

Permutation based P values

PermWufDist <- AllWufDist %>% modelr::permute(10000, WufDist)
mean_of_is_same <- function(df){df %>% as.data.frame %>% group_by(isSamePerson) %>% summarize(mean(WufDist)) %>% spread(isSamePerson, `mean(WufDist)`)}
test <- PermWufDist %>% pull(perm) %>% .[[1]] %>% as.data.frame
test %>% head
##   SampleX SampleY   WufDist pub_id_x pub_id_y isSamePerson
## 1      57      58 0.2863799      282      282         TRUE
## 2      57      60 0.5667858      282      176        FALSE
## 3      58      60 0.6974452      282      176        FALSE
## 4      57      61 0.8762536      282      176        FALSE
## 5      58      61 0.3909422      282      176        FALSE
## 6      60      61 0.8243214      176      176         TRUE
mean_of_is_same(test)
## # A tibble: 1 x 2
##   `FALSE` `TRUE`
##     <dbl>  <dbl>
## 1   0.496  0.462
SameAndDiff
## # A tibble: 1 x 3
##   between within   diff
##     <dbl>  <dbl>  <dbl>
## 1   0.497  0.411 0.0860
permutedMeans <- PermWufDist %>% mutate(meanvals = map(perm, mean_of_is_same)) %>% dplyr::select(meanvals) %>% unnest %>% 
rename(between = `FALSE`, within = `TRUE`) %>%
mutate(diff = between - within, absdif = abs(diff)) %>%
mutate(isExtreme = absdif >= SameAndDiff$diff, isExtreme1Tail = diff >= SameAndDiff$diff)
#permutedMeans %>% ggplot(aes(x = diff)) + geom_histogram() + geom_vline(xintercept = SameAndDiff$diff)
permutedMeans %>% ggplot(aes(x = diff)) + geom_histogram() + geom_vline(xintercept = SameAndDiff$diff)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Fraction of permuted values less extreme than difference between same and different.

#permutedMeans %>% mutate(isExtreme = absdif >= SameAndDiff$diff)
sum(permutedMeans$isExtreme) / length(permutedMeans$isExtreme)
## [1] 0.0153
sum(permutedMeans$isExtreme1Tail) / length(permutedMeans$isExtreme1Tail)
## [1] 0.0073

Two and one tailed permuted p-values

Plot of confedence interval and raw data

options(repr.plot.width=6, repr.plot.height= 6)
boot_means  %>% dplyr::select(within, between) %>% gather(key = "comparison", value = "WufDist") %>% ggplot(aes(x = comparison, y = WufDist)) + 
geom_dotplot(data = AllWufDist %>% mutate(isSamePerson2 = if_else(isSamePerson, "within", "between"))
             , aes(x = isSamePerson2, y = WufDist), binaxis = "y", stackdir = "center", binwidth = .01, colour = "gray40", fill = "white") + 
geom_violin(fill = NA) + 
geom_point(data = data.frame(comparison = c("within", "between"), WufDist = WufMeans$mean), aes(x = comparison, y = c(WufDist[2], WufDist[1])), shape = 22, fill = "black", size = 2)+
theme_bw() +
scale_x_discrete(limits = c("within", "between")) + labs(y = "Weighted Unifrac Distance") #

ggsave('figures/BetweenVsWithin.png')  
## Saving 7 x 5 in image

Figure: Open circles represent weighed unifrac distances associated with pairs of samples taken from the same set of participants, at different time points (“within”), and samples taken from different sets of participants (“between”). Black squares indicate the observed mean of the within and between values. Violins indicate distributions of bootstrapped mean values.

Variability at earliest sampling

psN2.wuf <- phyloseq::distance(psN2, method = "wunifrac")
psN2.pcoa <- capscale(psN2.wuf ~ 1)
# How much variance si explained by each weighted unifrac axis
# Note, ten axes cover 95% of the variance. 
# I'm not going to look beyond that for any test.
data.frame(eig = psN2.pcoa$CA$eig) %>%
rownames_to_column('axis') %>%
mutate(proportion = eig/sum(eig)) %>%
mutate(cumulative = cumsum(proportion))
##     axis         eig   proportion cumulative
## 1   MDS1 0.825438822 0.2888040980  0.2888041
## 2   MDS2 0.495612315 0.1734045743  0.4622087
## 3   MDS3 0.364619128 0.1275727475  0.5897814
## 4   MDS4 0.268169168 0.0938268864  0.6836083
## 5   MDS5 0.261743609 0.0915787154  0.7751870
## 6   MDS6 0.162405998 0.0568225245  0.8320095
## 7   MDS7 0.146142884 0.0511323948  0.8831419
## 8   MDS8 0.097842330 0.0342330228  0.9173750
## 9   MDS9 0.061705028 0.0215893226  0.9389643
## 10 MDS10 0.050336955 0.0176118673  0.9565762
## 11 MDS11 0.041072011 0.0143702533  0.9709464
## 12 MDS12 0.022613493 0.0079119971  0.9788584
## 13 MDS13 0.017745217 0.0062086872  0.9850671
## 14 MDS14 0.016323022 0.0057110902  0.9907782
## 15 MDS15 0.014802146 0.0051789668  0.9959571
## 16 MDS16 0.009677461 0.0033859449  0.9993431
## 17 MDS17 0.001877523 0.0006569069  1.0000000
my_breaks = c(1, 75, 250, 500, 1000,2000)
psN2 %>% mutate_phyloseq_sample(
                               mc41 = factor(medcode_hl(IgG_gp41_Month_0)),
                                                log120 = (IgG_Con.6.gp120.B_Month_12)) -> psN2_mod
psN2_mod%>%
sample_data() %>%
ggplot(aes(x = MDS1, y = MDS2)) + geom_point(aes(fill = mc41), size = 5, stroke = 1, shape = 21) +
coord_fixed(sqrt(psN2.pcoa$CA$eig[2]/psN2.pcoa$CA$eig[1])) +
viridis::scale_fill_viridis(name = 'gp41 Baseline', direction = -1, discrete = TRUE) +
#scale_colour_manual(name = 'gp41 Primary', values = c('black', 'grey70')) + 
theme_bw() -> wuford_gp41


psN2_mod %>%
sample_data() %>%
ggplot(aes(x = MDS1, y = MDS2)) + geom_point(aes(fill = log120), size = 5, stroke = 1, shape = 21) +
coord_fixed(sqrt(psN2.pcoa$CA$eig[2]/psN2.pcoa$CA$eig[1])) +
viridis::scale_fill_viridis(name = 'gp120 Final', direction = 1, trans = "sqrt",
                           breaks = my_breaks, labels = my_breaks) +
#scale_colour_manual(name = 'gp41 Primary', values = c('black', 'grey70')) + 
theme_bw() -> wuford_gp120

par <- options()
options(repr.plot.width=11, repr.plot.height= 4)
#g <- grid.arrange(wuford_gp41, wuford_gp120, ncol = 2)
g <- cowplot::plot_grid(wuford_gp41, wuford_gp120, ncol = 2, labels = c("A", "B"), label_size = 24)
g

#ggsave('figures/wunifrac_Agp41_Bgp120_pcoa.png', g, width = 8, height = 4)
#ggsave('figures/wunifrac_Agp41_Bgp120_pcoa.svg', g, width = 8, height = 4)
cowplot::save_plot('figures/wunifrac_Agp41_Bgp120_pcoa.png', g, base_width = 8, base_height = 4)
cowplot::save_plot('figures/wunifrac_Agp41_Bgp120_pcoa.svg', g, base_width = 8, base_height = 4)

Kernel Regression and Weighted Unifrac GLM

wufKN2 <- D2K(as.matrix(psN2.wuf))
muDoners <- unique(sample_data(psN2)$pub_id)
immune.data %>%
filter(pub_id %in% muDoners) %>%
filter(
    (type == 'IgG' & 
    antigen %in% ants1 &
    month %in% c(6.5,12)
    ) |
    (type %in% c('IgG', 'IgA') &
     antigen %in% ants2 &
     month %in% c(0,6.5,12)
    ) |
    type == 'CD4+' &
    antigen == 'ANY.ENV.PTEG' &
    month %in% c(6.5, 12)
      )-> use.immune
head(use.immune)
## # A tibble: 6 x 13
##   visitno rx_code type  antigen    mag mag_bl response   day month ct   
##     <int> <chr>   <chr> <chr>    <dbl>  <dbl>    <int> <int> <dbl> <chr>
## 1       9 T1      IgA   gp41      333.   110.        0   182   6.5 T    
## 2      12 T1      IgA   gp41        1    110.        0   364  12   T    
## 3       9 T1      IgA   p24       658    330.        0   182   6.5 T    
## 4      12 T1      IgA   p24       862.   330.        0   364  12   T    
## 5       9 T1      IgG   Con.6.… 27128.     1         1   182   6.5 T    
## 6      12 T1      IgG   Con.6.…  1488      1         1   364  12   T    
## # ... with 3 more variables: response_j <dbl>, assay <chr>, pub_id <dbl>
# Do permanova and related tests to a variable of interest
# This function is pretty specific to this analysis, so I'm going to leave it
# here in the notebook file
CapVar <- function(x, nperm = 9999, transformation = medcode2, family = 'binomial'){
    ## Pull out the needed data
    
    psN2.wMDS <- psN2 %>% phylo_join(scores(psN2.pcoa, display = "sites", choices = 1:10) %>%
                    as.data.frame %>% rownames_to_column, by = 'rowname')
    
    medWuf <- NA
    rankWuf <- NA
    locPS <- phylo_join(psN2.wMDS, x, by = 'pub_id') 
    ydata0 <- sample_data(locPS)$mag
    yna <- is.na(ydata0)
    #loc.wuf <- wufKN2
    #loc.jsd <- jsdKN2
    ydata <- ydata0
    
    ydata <- ydata0[!yna]
    loc.wuf2 <- psN2.wuf %>% as.matrix %>% .[!yna, !yna]
    
    medWuf <- adonis(loc.wuf2 ~ transformation(ydata), permutations = nperm)
    #medWuf$aov.tab[1,c('R2', 'Pr(>F)')]
    
    ## Capscale returns the same results as adonis (permanova), but also gives some other interesting results
    
    medWufCap <- capscale(loc.wuf2 ~ transformation(ydata))
    capanova <- anova(medWufCap, permutations = nperm)
    
    samDf <- locPS %>% sample_data %>% as('data.frame') %>% rownames_to_column %>%
     left_join(
        vegan::scores(medWufCap, display = 'sites') %>% as.data.frame %>% dplyr::select(CAP1) %>%
        rownames_to_column, by = 'rowname') %>% .[!yna,]
    
#     # Is giving only positive results with CAP1, not sure why
#     glmAnova <- glm(medcode(ydata) ~  MDS1 + CAP1, data = samDf, family = 'binomial') %>% anova(test = "Chisq")
    loc_glm <- glm(transformation(ydata) ~  MDS1, data = samDf, family = family)
    glmAnova <- loc_glm %>% anova(test = "Chisq")
    #glmAnova['CAP1', 'Deviance']/out_capanova['NULL', 'Resid. Dev']
    
    ## check against mirkat
    loc.Kwuf2 <- wufKN2[!yna, !yna]
    mirkatP <- MiRKAT(y = transformation(ydata), Ks = loc.Kwuf2, out_type = "C", method = 'permutation', nperm = nperm)
    
    #list(medWuf, capanova, mirkatP)
    
    pred_pct <- predict(loc_glm, type = "response")
    pred_01 <- as.numeric(predict(loc_glm, type = "response") > 0.5)
    
    accuracy <- mean(transformation(ydata) == pred_01)
    
        null_glm <- update(loc_glm, ~1)

    # Canonical caluclation of McFadden's R2 for the GLM
    McFadden = 1- (logLik(loc_glm)/ logLik(null_glm))
    L.full = logLik(loc_glm)
    D.full = -2 * L.full
    L.base = logLik(null_glm)
    D.base = -2 * L.base
    n = dim(samDf)[1]
    Nagelkerke = (1 - exp((D.full - D.base)/n))/(1 - exp(-D.base/n))
    
    
    # A GLM of all weighted unifrac components
    
    
    data.frame(
        caps.P = capanova['Model', 'Pr(>F)'],
        adonisP = medWuf$aov.tab[1, 'Pr(>F)'],
        mir.P = mirkatP,
        caps.F = capanova['Model', 'F'],
        caps.R2 = medWufCap$CCA$tot.chi/medWufCap$tot.chi, 
        wuf1.P = glmAnova['MDS1', 'Pr(>Chi)'],
        wuf1.DR = glmAnova['MDS1', 'Deviance'] / glmAnova['NULL', 'Resid. Dev'],
        wuf1.McFadden = Nagelkerke,
        accuracy,
        wuf1.coef = coef(loc_glm)[2]
        #cap1.P = glmAnova['CAP1', 'Pr(>Chi)'],
        #cap1.R2 = glmAnova['CAP1', 'Deviance'] / glmAnova['NULL', 'Resid. Dev']
    )
    }
use.immune %>%
filter(type == 'IgG' & antigen == 'gp41'& month == 0 & ct == 'T') -> test.immune1
# Just confirming that the function works before it goes in a giant loop. I'd delete this,
# but i'll just end up needing it again if I do.
ptm = proc.time()
tps <- CapVar(test.immune1, nperm = 9999, transformation = medcode, family = 'binomial')
proc.time() - ptm
##    user  system elapsed 
##   0.681   0.012   0.694
tps
##      caps.P adonisP  mir.P   caps.F   caps.R2     wuf1.P   wuf1.DR
## MDS1 0.0437  0.0479 0.0466 1.860805 0.0900595 0.01437564 0.2061418
##      wuf1.McFadden  accuracy wuf1.coef
## MDS1     0.3312048 0.6666667  2.184936
use.immune %>%
filter(type == 'CD4+' & month == 6.5 & ct == 'T') -> test.immune.pteg
# Just confirming that the function works before it goes in a giant loop. I'd delete this,
# but i'll just end up needing it again if I do.
ptm = proc.time()
tps <- CapVar(test.immune.pteg, nperm = jnperm, transformation = medcode, family = 'binomial')
proc.time() - ptm
##    user  system elapsed 
##   5.673   0.024   5.697
tps
##      caps.P adonisP   mir.P   caps.F    caps.R2    wuf1.P    wuf1.DR
## MDS1 0.2514 0.24798 0.25199 1.214412 0.06065471 0.2176327 0.05229439
##      wuf1.McFadden  accuracy  wuf1.coef
## MDS1     0.0931634 0.5714286 -0.9372546
# Run above function against every relevant variable.
ptm <- proc.time()

use.immune %>% 
filter(ct == 'T') %>%
group_by(type, antigen, month) %>%
do(data.frame(CapVar(., nperm = jnperm))) -> permKernTable
## Warning: `env_bind_fns()` is deprecated as of rlang 0.3.0.
## Please use `env_bind_active()` instead.
## This warning is displayed once per session.
## Warning: `new_overscope()` is deprecated as of rlang 0.2.0.
## Please use `new_data_mask()` instead.
## This warning is displayed once per session.
## Warning: `overscope_eval_next()` is deprecated as of rlang 0.2.0.
## Please use `eval_tidy()` with a data mask instead.
## This warning is displayed once per session.
## Warning: `overscope_clean()` is deprecated as of rlang 0.2.0.
## This warning is displayed once per session.
permKernTable
## # A tibble: 20 x 13
## # Groups:   type, antigen, month [20]
##    type  antigen month  caps.P adonisP   mir.P caps.F caps.R2  wuf1.P
##    <chr> <chr>   <dbl>   <dbl>   <dbl>   <dbl>  <dbl>   <dbl>   <dbl>
##  1 CD4+  ANY.EN…   6.5 0.252   0.249   0.251    1.21   0.0607 0.218  
##  2 CD4+  ANY.EN…  12   0.246   0.254   0.249    1.22   0.0642 0.228  
##  3 IgA   gp41      0   0.958   0.954   0.958    0.449  0.0233 0.651  
##  4 IgA   gp41      6.5 0.205   0.204   0.206    1.29   0.0644 0.143  
##  5 IgA   gp41     12   0.745   0.742   0.744    0.712  0.0384 0.855  
##  6 IgA   p24       0   0.895   0.893   0.893    0.552  0.0285 0.320  
##  7 IgA   p24       6.5 0.903   0.900   0.902    0.541  0.0279 0.650  
##  8 IgA   p24      12   0.385   0.383   0.385    1.04   0.0551 0.387  
##  9 IgG   Con.6.…   6.5 0.00427 0.00408 0.00431  2.82   0.130  0.00176
## 10 IgG   Con.6.…  12   0.0365  0.0374  0.0358   1.96   0.0943 0.0100 
## 11 IgG   gp41      0   0.0469  0.0521  0.0461   1.86   0.0901 0.0144 
## 12 IgG   gp41      6.5 0.0456  0.0463  0.0446   1.87   0.0904 0.0305 
## 13 IgG   gp41     12   0.651   0.643   0.651    0.793  0.0404 0.779  
## 14 IgG   gp70_B…   6.5 0.906   0.904   0.904    0.539  0.0278 0.619  
## 15 IgG   gp70_B…  12   0.0350  0.0383  0.0346   1.97   0.0948 0.0143 
## 16 IgG   p24       0   0.214   0.215   0.218    1.28   0.0635 0.444  
## 17 IgG   p24       6.5 0.414   0.413   0.420    1.01   0.0538 0.397  
## 18 IgG   p24      12   0.287   0.280   0.286    1.17   0.0583 0.159  
## 19 IgG   ZM96.g…   6.5 0.0165  0.0164  0.0167   2.26   0.107  0.00897
## 20 IgG   ZM96.g…  12   0.295   0.292   0.298    1.15   0.0577 0.162  
## # ... with 4 more variables: wuf1.DR <dbl>, wuf1.McFadden <dbl>,
## #   accuracy <dbl>, wuf1.coef <dbl>
proc.time() - ptm
##    user  system elapsed 
## 116.813   0.280 117.103

The above function runs several extra tests. Results as follows:

type antigen visitno - things we run over

caps.P - Capscale test asks whether if we rotate things a bit and then try to use the best axis to compare to the data. Its similar to the wuf1.P value, but with some rotation

adonisP - p-value for a permanova test. Similar to mirkat p-value. One key exception is that igg_gp41_Month_0 falls on different sides of the 0.05 threshold.

mir.P is the p value for the kernel regression test, as run in the MiRKAT package. (Zhao et al., 2015)

caps.F and caps R2 are the f and r squared values for the capscale test.

wuf.P - is the p value of a glm comparing weighted unifrac component one against variables of interest. This test appears to always be statistically significantly positive when the mirkat test is positve.

wuf1.DR - one way of calculating an R2 value from a glm. We devide the deviance by the residual deviance

wuf1.McFadden - is a McFadden’s pseudo R^2. This turns out to be identical to the previous calculation.

accuracy - the fraction of the time that the glm predicts something falls above or below the median correctly. This turns out to not be super informative. Everything has around a 60% accuracy.

wuf1.coef - the coefficient of the glm model. The sign is relevant. Things with postive sign are associated with high values of weighted unifrac axis 1.

# Clean up so we just see the results of the kernel regression 
concisePermKernTable <- permKernTable %>% ungroup %>%
mutate(Kernel_Q = p2q(mir.P), MDS1_Q = p2q(wuf1.P)) %>%
dplyr::select(Type = type, Antigen = antigen,Month = month, Kernel_P = mir.P, Kernel_Q,
              MDS1_P = wuf1.P, MDS1_Q, GlmMDS1_R2 = wuf1.McFadden, MDS1_Coef = wuf1.coef) %>%
as.data.frame %>% 
pass -> concisePermKernTable
write.csv(format(concisePermKernTable, digits = 3), 'tables/concisePermkernTable.csv')

Table 1

# export conditionally formatted table as html

colNames1 = c(' ' = 3, 'Kernel' = 2, 'MDS' = 4)
colNames2 = c('Type', 'Antigen', 'Month', 'P', 'Q', 'P', 'Q', 'R2', 'Coef' )

concisePermKernTable %>%
mutate(
    # this row needs to happen first, since the reformatting of the nother numbers makes them harder to call
    MDS1_Coef = cell_spec(format(MDS1_Coef, digits = 3), "html",
                           bold = ifelse(Kernel_P < 0.05, 
                                         T,
                                         F),
                          italic = ifelse(Kernel_P < 0.05 & MDS1_Coef < 0,
                                        T,
                                         F),
                          background = ifelse(Kernel_P < 0.05, ifelse(MDS1_Coef < 0, "lightsalmon", "lightblue"), "")
                         ),
    Kernel_P = cell_spec(format_round(Kernel_P, 3), "html",
                                  bold = ifelse(Kernel_P < 0.05, T, F),
                                  background = ifelse(Kernel_P < 0.05, 'yellow', '')
                                 ),
    Kernel_Q = cell_spec(format_round(Kernel_Q, 3), "html",
                                  bold = ifelse(Kernel_Q < 0.2, T, F),
                                  background = ifelse(Kernel_Q < 0.2, 'lightyellow', '')
                                 ),
     MDS1_P  = cell_spec(format_round(MDS1_P, 3), "html",
                                  bold = ifelse(MDS1_P < 0.05, T, F),
                                  background = ifelse(MDS1_P < 0.05, 'yellow', '')
                                 ),
    MDS1_Q = cell_spec(format_round(MDS1_Q, 3), "html",
                                  bold = ifelse(MDS1_Q < 0.2, T, F),
                                  background = ifelse(MDS1_Q < 0.2, 'lightyellow', '')
                                 ),
    #Month = cell_spec(format_round(Month,0), "html")
    Month = cell_spec(Month, "html")

    
      ) %>%
mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen)) -> toTable
toTable %>%

kable("html", escape = F, digits = 3, align = 'c', col.names = colNames2) %>%
kable_styling("striped", "hover", full_width = F) %>%
add_header_above(colNames1) %>%
collapse_rows(columns = 1:2, latex_hline = "full") -> concisePermKernTable.html

concisePermKernTable.html
Kernel
MDS
Type Antigen Month P Q P Q R2 Coef
CD4+ Any ENV PTEG 6.5 0.251 0.497 0.218 0.125 0.093 -0.937
12 0.249 0.497 0.228 0.125 0.094 -0.920
IgA gp41 0 0.958 0.958 0.651 0.219 0.013 0.333
6.5 0.206 0.497 0.143 0.109 0.129 -1.133
12 0.744 0.930 0.855 0.259 0.002 -0.136
p24 0 0.893 0.952 0.320 0.161 0.061 0.752
6.5 0.902 0.952 0.650 0.219 0.013 -0.333
12 0.385 0.593 0.387 0.172 0.049 -0.657
IgG Con.6.gp120.B 6.5 0.004 0.086 0.002 0.011 0.497 -3.104
12 0.036 0.154 0.010 0.017 0.361 -2.289
gp41 0 0.046 0.154 0.014 0.017 0.331 2.185
6.5 0.045 0.154 0.030 0.031 0.267 -1.809
12 0.651 0.868 0.779 0.248 0.005 -0.205
gp70 B.CaseA V1-V2 6.5 0.904 0.952 0.619 0.219 0.016 -0.365
12 0.035 0.154 0.014 0.017 0.332 -2.135
p24 0 0.218 0.497 0.444 0.179 0.037 -0.567
6.5 0.420 0.600 0.397 0.172 0.047 -0.749
12 0.286 0.497 0.159 0.109 0.120 -1.085
ZM96.gp140 6.5 0.017 0.154 0.009 0.017 0.370 -2.338
12 0.298 0.497 0.162 0.109 0.118 -1.076
concisePermKernTable.html %>% cat(file = 'tables/concisePermkernTable.html')

Latex version of the same table

docHead <- "\\documentclass[12pt]{article} % use larger type; default would be 10pt

\\usepackage[utf8]{inputenc} % set input encoding (not needed with XeLaTeX)
\\usepackage{booktabs}
\\usepackage{longtable}
\\usepackage{array}
\\usepackage{multirow}
\\usepackage[table]{xcolor}
\\usepackage{wrapfig}
\\usepackage{float}
\\usepackage{colortbl}
\\usepackage{pdflscape}
\\usepackage{tabu}
\\usepackage{threeparttable}
\\usepackage{threeparttablex}
\\usepackage[normalem]{ulem}
\\usepackage{makecell}

\\definecolor{green}{rgb}{1, 1, .9}

\\begin{document}
"

docTail <- "\\end{document}
"
# Make latex table

concisePermKernTable %>%
mutate(
    # this row needs to happen first, since the reformatting of the nother numbers makes them harder to call
    MDS1_Coef = cell_spec(format(MDS1_Coef, digits = 3), "latex",
                           bold = ifelse(MDS1_P < 0.05, 
                                         T,
                                         F),
                          italic = ifelse(MDS1_P < 0.05 & MDS1_Coef < 0,
                                        T,
                                         F)),
    Kernel_P = cell_spec(format_round(Kernel_P, 3), "latex",
                                  bold = ifelse(Kernel_P < 0.05, T, F),
                                  background = ifelse(Kernel_P < 0.05, 'yellow', 'white')
                                 ),
    Kernel_Q = cell_spec(format_round(Kernel_Q, 3), "latex",
                                  bold = ifelse(Kernel_Q < 0.2, T, F),
                                  background = ifelse(Kernel_Q < 0.2, 'green', 'white')
                                 ),
     MDS1_P  = cell_spec(format_round(MDS1_P, 3), "latex",
                                  bold = ifelse(MDS1_P < 0.05, T, F),
                                  background = ifelse(MDS1_P < 0.05, 'yellow', 'white')
                                 ),
    MDS1_Q = cell_spec(format_round(MDS1_Q, 3), "latex",
                                  bold = ifelse(MDS1_Q < 0.2, T, F),
                                  background = ifelse(MDS1_Q < 0.2, 'green', 'white')
                                 ),
    #Month = cell_spec(format_round(Month,0), "html")
    Month = cell_spec(Month, "latex")

    
      ) %>%
mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen)) -> toTable

toTable %>% 
kable("latex", escape = F, digits = 3, align = 'c', col.names = colNames2, booktabs = T) %>%
kable_styling(position = "left") %>%

add_header_above(colNames1) %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass -> concisePermKernTable.tex
# Print latex table to tex file

cat(docHead, concisePermKernTable.tex, docTail, file = 'tables/concisePermkernTable1.tex')
concisePermKernTable %>% filter(Kernel_P < 0.05) -> shortPermkernTable
shortPermkernTable
##   Type            Antigen Month Kernel_P Kernel_Q      MDS1_P     MDS1_Q
## 1  IgG      Con.6.gp120.B   6.5  0.00431   0.0862 0.001759476 0.01064944
## 2  IgG      Con.6.gp120.B  12.0  0.03584   0.1538 0.010022385 0.01740207
## 3  IgG               gp41   0.0  0.04614   0.1538 0.014375638 0.01740207
## 4  IgG               gp41   6.5  0.04457   0.1538 0.030461387 0.03072856
## 5  IgG gp70_B.CaseA_V1_V2  12.0  0.03465   0.1538 0.014277612 0.01740207
## 6  IgG         ZM96.gp140   6.5  0.01671   0.1538 0.008971440 0.01740207
##   GlmMDS1_R2 MDS1_Coef
## 1  0.4969907 -3.103554
## 2  0.3612921 -2.289438
## 3  0.3312048  2.184936
## 4  0.2667186 -1.808614
## 5  0.3317812 -2.134684
## 6  0.3704046 -2.338385
write.csv(format(shortPermkernTable, digits = 3), 'tables/shortPermkernTable.csv')

Q-Q Plot

Of kernel regression P values

my_runif <- function(Len){
    loc_runif <- runif(n = Len)
    sort_loc_runif <- sort(loc_runif)
    data.frame(case = 1:Len, relement = sort_loc_runif)
}

calc_bound <- function(df, bound){
    quantile(df$relement, bound)
}

make_qqdata <- function(pvec, nboot = 10000){
    locP <- pvec

LP <- length(locP)
#1:10 %>% map(runif, n = LP)
sortedP <- sort(locP)
exp2 <- 1:length(locP)/length(locP)
sortedExpP <- sort(exp2)

random_pvalues <- data.frame(iter = 1:nboot) %>%
mutate(rand = map(iter, ~my_runif(Len = LP))) %>% unnest %>%
nest(-case) %>%
mutate(lb = map_dbl(data, ~calc_bound(., 0.025)),
      ub = map_dbl(data, ~calc_bound(.,0.975))) %>% dplyr::select (-data)

qqdata <- bind_cols(sortedP = sortedP, sortedExpP = sortedExpP, random_pvalues)

return(qqdata)
}
qqdata_permKernMir <- make_qqdata(permKernTable$mir.P)
qqdata_permKernMir %>% str
## Classes 'tbl_df', 'tbl' and 'data.frame':    20 obs. of  5 variables:
##  $ sortedP   : num  0.00431 0.01671 0.03465 0.03584 0.04457 ...
##  $ sortedExpP: num  0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 ...
##  $ case      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ lb        : num  0.00123 0.01314 0.03293 0.05827 0.08676 ...
##  $ ub        : num  0.169 0.247 0.317 0.38 0.441 ...
options(repr.plot.width=6, repr.plot.height= 6)
qqdata_permKernMir %>% ggplot(aes(x = sortedExpP, y = sortedP)) + geom_point(shape = 1) + geom_abline(intercept = 0, slope = 1) + 
labs(x = expression(paste("Expected ", italic("p"),"-value")), y = expression(paste("Observed ", italic("p"),"-value")))

ggsave('figures/qq_permKernMir.png')
## Saving 7 x 5 in image
qqdata_permKernMir %>% ggplot(aes(x = sortedExpP, y = sortedP)) + geom_point(shape = 1) + geom_abline(intercept = 0, slope = 1) + geom_line(aes(y = lb), colour = "grey40") + geom_line(aes(y = ub), colour = "grey40") + scale_y_log10() + scale_x_log10() + 
  labs(x = expression(paste("Expected ", italic("p"),"-value")), y = expression(paste("Observed ", italic("p"),"-value")))

We see from the qqplot that we generally fall below the 1:1 line, suggesting that our P values are lower than we would expect from chance. To make the grey lines, I sampled 95% confidence intervals of random draws from the uniform distribution (essentially random p values). The data points sometimes, but not always fall below the lower one of these. However, we care about all of the p values, not wheter eaach individual is more unusual than we would expect by chance. I’ll replort just the regular qqplot as a supplemental figure.

As above, but this time with gaussian - continuous dependent variables

ptm = proc.time()
tps <- CapVar(test.immune1, nperm = 9999, transformation = function(x){jac_box_cox_2(x)}, family = 'gaussian')
proc.time() - ptm
##    user  system elapsed 
##   0.648   0.000   0.648
tps
##      caps.P adonisP  mir.P   caps.F    caps.R2     wuf1.P   wuf1.DR
## MDS1 0.2222  0.2538 0.2194 1.311871 0.06520801 0.03795074 0.1848023
##      wuf1.McFadden accuracy wuf1.coef
## MDS1     0.1969076        0  0.700859
# Run above function against every relevant variable.
ptm <- proc.time()

use.immune %>%
filter(ct == 'T') %>%
group_by(type, antigen, month) %>%
do(data.frame(CapVar(., nperm = jnperm,
                     transformation = function(x){jac_box_cox_2(x)},
                     family = 'gaussian'))) -> permKernTableGaus
permKernTableGaus
## # A tibble: 20 x 13
## # Groups:   type, antigen, month [20]
##    type  antigen month caps.P adonisP   mir.P caps.F caps.R2  wuf1.P
##    <chr> <chr>   <dbl>  <dbl>   <dbl>   <dbl>  <dbl>   <dbl>   <dbl>
##  1 CD4+  ANY.EN…   6.5 0.603  6.02e-1 0.602    0.827  0.0421 5.97e-1
##  2 CD4+  ANY.EN…  12   0.185  1.94e-1 0.185    1.36   0.0707 3.29e-1
##  3 IgA   gp41      0   0.990  9.88e-1 0.989    0.311  0.0163 6.33e-1
##  4 IgA   gp41      6.5 0.411  4.07e-1 0.411    1.00   0.0505 6.01e-1
##  5 IgA   gp41     12   0.655  6.52e-1 0.651    0.755  0.0406 5.04e-1
##  6 IgA   p24       0   0.953  9.53e-1 0.954    0.417  0.0217 4.33e-1
##  7 IgA   p24       6.5 0.976  9.73e-1 0.976    0.373  0.0194 8.57e-1
##  8 IgA   p24      12   0.574  5.75e-1 0.575    0.852  0.0456 3.09e-1
##  9 IgG   Con.6.…   6.5 0.0736 7.60e-2 0.0731   1.72   0.0839 3.16e-2
## 10 IgG   Con.6.…  12   0.0007 6.00e-4 0.00068  3.55   0.159  1.43e-5
## 11 IgG   gp41      0   0.221  2.48e-1 0.221    1.31   0.0652 3.80e-2
## 12 IgG   gp41      6.5 0.220  2.25e-1 0.219    1.29   0.0643 7.98e-2
## 13 IgG   gp41     12   0.266  2.71e-1 0.263    1.23   0.0613 1.73e-1
## 14 IgG   gp70_B…   6.5 0.287  2.93e-1 0.282    1.19   0.0595 2.06e-1
## 15 IgG   gp70_B…  12   0.110  1.12e-1 0.111    1.55   0.0763 8.34e-2
## 16 IgG   p24       0   0.754  7.45e-1 0.753    0.636  0.0327 9.07e-1
## 17 IgG   p24       6.5 0.0268 2.61e-2 0.0272   2.15   0.108  5.30e-2
## 18 IgG   p24      12   0.411  4.14e-1 0.413    1.02   0.0513 1.33e-1
## 19 IgG   ZM96.g…   6.5 0.0293 2.96e-2 0.0290   2.24   0.106  1.69e-2
## 20 IgG   ZM96.g…  12   0.204  2.00e-1 0.198    1.32   0.0657 7.77e-2
## # ... with 4 more variables: wuf1.DR <dbl>, wuf1.McFadden <dbl>,
## #   accuracy <dbl>, wuf1.coef <dbl>
proc.time() - ptm
##    user  system elapsed 
## 117.617   0.532 118.160
# Clean up so we just see the results of the kernel regression 
concisePermKernTableGaus <- permKernTableGaus %>% ungroup %>%
mutate(Kernel_Q = p2q(mir.P), MDS1_Q = p2q(wuf1.P)) %>%
dplyr::select(Type = type, Antigen = antigen, Month = month, Kernel_P = mir.P, Kernel_Q,
              MDS1_P = wuf1.P, MDS1_Q, MDS1_R2 = wuf1.McFadden, MDS1_Coef = wuf1.coef) %>%
as.data.frame %>% 
pass 

concisePermKernTableGaus
##    Type            Antigen Month Kernel_P  Kernel_Q       MDS1_P
## 1  CD4+       ANY.ENV.PTEG   6.5  0.60213 0.8028400 5.971764e-01
## 2  CD4+       ANY.ENV.PTEG  12.0  0.18544 0.4920667 3.291117e-01
## 3   IgA               gp41   0.0  0.98896 0.9889600 6.330939e-01
## 4   IgA               gp41   6.5  0.41053 0.6358769 6.006684e-01
## 5   IgA               gp41  12.0  0.65053 0.8131625 5.043440e-01
## 6   IgA                p24   0.0  0.95380 0.9889600 4.331569e-01
## 7   IgA                p24   6.5  0.97610 0.9889600 8.573830e-01
## 8   IgA                p24  12.0  0.57478 0.8028400 3.091250e-01
## 9   IgG      Con.6.gp120.B   6.5  0.07311 0.3655500 3.164215e-02
## 10  IgG      Con.6.gp120.B  12.0  0.00068 0.0136000 1.431333e-05
## 11  IgG               gp41   0.0  0.22143 0.4920667 3.795074e-02
## 12  IgG               gp41   6.5  0.21873 0.4920667 7.975175e-02
## 13  IgG               gp41  12.0  0.26339 0.5136000 1.731298e-01
## 14  IgG gp70_B.CaseA_V1_V2   6.5  0.28248 0.5136000 2.062894e-01
## 15  IgG gp70_B.CaseA_V1_V2  12.0  0.11097 0.4438800 8.341959e-02
## 16  IgG                p24   0.0  0.75279 0.8856353 9.069707e-01
## 17  IgG                p24   6.5  0.02725 0.1932667 5.296858e-02
## 18  IgG                p24  12.0  0.41332 0.6358769 1.334352e-01
## 19  IgG         ZM96.gp140   6.5  0.02899 0.1932667 1.691120e-02
## 20  IgG         ZM96.gp140  12.0  0.19822 0.4920667 7.767448e-02
##          MDS1_Q      MDS1_R2   MDS1_Coef
## 1  0.3441069304 0.0154346775 -0.19622223
## 2  0.2476839187 0.0535522002 -0.35622365
## 3  0.3441069304 0.0126283362  0.17748943
## 4  0.3441069304 0.0151466567 -0.19438279
## 5  0.3289526950 0.0257545280  0.25159262
## 6  0.3027018097 0.0333728299  0.28853330
## 7  0.4414881726 0.0018079718 -0.06715761
## 8  0.2476839187 0.0579135356 -0.37727770
## 9  0.0928235317 0.2083288787 -0.72089857
## 10 0.0001400356 0.5303144044 -1.15018086
## 11 0.0928235317 0.1969076139  0.70085904
## 12 0.1020177828 0.1482127646 -0.60805411
## 13 0.1693829682 0.0948033344 -0.48630779
## 14 0.1834771570 0.0826277473 -0.45400682
## 15 0.1020177828 0.1451699474 -0.60178005
## 16 0.4436710261 0.0007652868  0.04369297
## 17 0.1020177828 0.1835313650 -0.78663031
## 18 0.1450526857 0.1129112051 -0.53072304
## 19 0.0827260298 0.2460702885 -0.78348198
## 20 0.1020177828 0.1499943359 -0.61169771
write.csv(format(concisePermKernTableGaus, digits = 3), 'tables/concisePermkernTableGaus.csv')

Table S1

# export conditionally formatted table as html
concisePermKernTableGaus %>%
mutate(
    # this row needs to happen first, since the reformatting of the nother numbers makes them harder to call
       MDS1_Coef = cell_spec(format_round(MDS1_Coef, digits = 3), "html",
                           bold = ifelse(Kernel_P < 0.05, 
                                         T,
                                         F),
                          italic = ifelse(Kernel_P < 0.05 & MDS1_Coef < 0,
                                        T,
                                         F),
                          background = ifelse(Kernel_P < 0.05, ifelse(MDS1_Coef < 0, "lightsalmon", "lightblue"), "")
                         ),
    Kernel_P = cell_spec(format_round(Kernel_P, 3), "html",
                                  bold = ifelse(Kernel_P < 0.05, T, F),
                                  background = ifelse(Kernel_P < 0.05, 'yellow', '')
                                 ),
    Kernel_Q = cell_spec(format_round(Kernel_Q, 3), "html",
                                  bold = ifelse(Kernel_Q < 0.2, T, F),
                                  background = ifelse(Kernel_Q < 0.2, 'lightyellow', '')
                                 ),
     MDS1_P  = cell_spec(format_round(MDS1_P, 3), "html",
                                  bold = ifelse(MDS1_P < 0.05, T, F),
                                  background = ifelse(MDS1_P < 0.05, 'yellow', '')
                                 ),
    MDS1_Q = cell_spec(format_round(MDS1_Q, 3), "html",
                                  bold = ifelse(MDS1_Q < 0.2, T, F),
                                  background = ifelse(MDS1_Q < 0.2, 'lightyellow', '')
                                 ),
    Month = cell_spec(Month, "html")

    
      ) %>%

mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen)) -> toTable

toTable %>%

kable("html", escape = F, digits = 3, align = 'c', col.names = colNames2) %>%
kable_styling("striped", "hover", full_width = F) %>%
add_header_above(colNames1) %>%
collapse_rows(columns = 1:2, latex_hline = "full") -> concisePermKernTableGaus.html

concisePermKernTableGaus.html
Kernel
MDS
Type Antigen Month P Q P Q R2 Coef
CD4+ Any ENV PTEG 6.5 0.602 0.803 0.597 0.344 0.015 -0.196
12 0.185 0.492 0.329 0.248 0.054 -0.356
IgA gp41 0 0.989 0.989 0.633 0.344 0.013 0.177
6.5 0.411 0.636 0.601 0.344 0.015 -0.194
12 0.651 0.813 0.504 0.329 0.026 0.252
p24 0 0.954 0.989 0.433 0.303 0.033 0.289
6.5 0.976 0.989 0.857 0.441 0.002 -0.067
12 0.575 0.803 0.309 0.248 0.058 -0.377
IgG Con.6.gp120.B 6.5 0.073 0.366 0.032 0.093 0.208 -0.721
12 0.001 0.014 0.000 0.000 0.530 -1.150
gp41 0 0.221 0.492 0.038 0.093 0.197 0.701
6.5 0.219 0.492 0.080 0.102 0.148 -0.608
12 0.263 0.514 0.173 0.169 0.095 -0.486
gp70 B.CaseA V1-V2 6.5 0.282 0.514 0.206 0.183 0.083 -0.454
12 0.111 0.444 0.083 0.102 0.145 -0.602
p24 0 0.753 0.886 0.907 0.444 0.001 0.044
6.5 0.027 0.193 0.053 0.102 0.184 -0.787
12 0.413 0.636 0.133 0.145 0.113 -0.531
ZM96.gp140 6.5 0.029 0.193 0.017 0.083 0.246 -0.783
12 0.198 0.492 0.078 0.102 0.150 -0.612
concisePermKernTableGaus.html %>% cat(file = 'tables/concisePermkernTableGaus.html')
concisePermKernTableGaus %>% filter(Kernel_P < 0.05) -> shortPermkernTableGaus
shortPermkernTableGaus
##   Type       Antigen Month Kernel_P  Kernel_Q       MDS1_P       MDS1_Q
## 1  IgG Con.6.gp120.B  12.0  0.00068 0.0136000 1.431333e-05 0.0001400356
## 2  IgG           p24   6.5  0.02725 0.1932667 5.296858e-02 0.1020177828
## 3  IgG    ZM96.gp140   6.5  0.02899 0.1932667 1.691120e-02 0.0827260298
##     MDS1_R2  MDS1_Coef
## 1 0.5303144 -1.1501809
## 2 0.1835314 -0.7866303
## 3 0.2460703 -0.7834820
write.csv(format(shortPermkernTableGaus, digits = 3), 'tables/shortPermkernTable.csv')
# Make latex table

concisePermKernTableGaus %>%
mutate(
    # this row needs to happen first, since the reformatting of the nother numbers makes them harder to call
    MDS1_Coef = cell_spec(format(MDS1_Coef, digits = 3), "html",
                           bold = ifelse(Kernel_P < 0.05, 
                                         T,
                                         F),
                          italic = ifelse(Kernel_P < 0.05 & MDS1_Coef < 0,
                                        T,
                                         F),
                          background = ifelse(Kernel_P < 0.05, ifelse(MDS1_Coef < 0, "lightsalmon", "lightblue"), "")
                         ),
    Kernel_P = cell_spec(format_round(Kernel_P, 3), "latex",
                                  bold = ifelse(Kernel_P < 0.05, T, F),
                                  background = ifelse(Kernel_P < 0.05, 'yellow', 'white')
                                 ),
    Kernel_Q = cell_spec(format_round(Kernel_Q, 3), "latex",
                                  bold = ifelse(Kernel_Q < 0.2, T, F),
                                  background = ifelse(Kernel_Q < 0.2, 'green', 'white')
                                 ),
     MDS1_P  = cell_spec(format_round(MDS1_P, 3), "latex",
                                  bold = ifelse(MDS1_P < 0.05, T, F),
                                  background = ifelse(MDS1_P < 0.05, 'yellow', 'white')
                                 ),
    MDS1_Q = cell_spec(format_round(MDS1_Q, 3), "latex",
                                  bold = ifelse(MDS1_Q < 0.2, T, F),
                                  background = ifelse(MDS1_Q < 0.2, 'green', 'white')
                                 ),
    #Month = cell_spec(format_round(Month,0), "html")
    Month = cell_spec(Month, "latex")

    
      ) %>%
mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen)) -> toTable

toTable %>% 
kable("latex", escape = F, digits = 3, align = 'c', col.names = colNames2, booktabs = T) %>%
kable_styling(position = "left") %>%

add_header_above(colNames1) %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass -> concisePermKernTableGaus.tex
# Print latex table to tex file

cat(docHead, concisePermKernTableGaus.tex, docTail, file = 'tables/concisePermkernTableGaus.tex')
### QQplot for kernel regression data
qqdata_permKernMirGaus <- make_qqdata(permKernTableGaus$mir.P)
options(repr.plot.width=6, repr.plot.height= 6)
qqdata_permKernMirGaus %>% ggplot(aes(x = sortedExpP, y = sortedP)) + geom_point(shape = 1) + geom_abline(intercept = 0, slope = 1) + 
labs(x = expression(paste("Expected ", italic("p"),"-value")), y = expression(paste("Observed ", italic("p"),"-value")))

ggsave('figures/qq_permKernMirGaus.png')
## Saving 7 x 5 in image

Chi Squared test for statistical associations between each pair of immune variables

use.immune %>% dplyr::select(pub_id, visitno, type, antigen, mag) -> tmp
full_join(tmp, tmp, by = 'pub_id') %>% 
group_by(visitno.x, type.x, antigen.x, visitno.y, type.y, antigen.y) %>%
nest %>%
mutate(x2 = map(data, function(df){unwarn(chisq.test(df$mag.x, df$mag.y))})) %>%
mutate(glance = map(x2, glance)) %>%
dplyr::select(-data, -x2) %>%
unnest(glance) %>%
#mutate(q.value = p2q(p.value)) %>% # reurns NaNs
pass -> compareImmuneX2
compareImmuneX2 %>% filter(
    type.x == 'IgG' &
    antigen.x == 'gp41' &
    type.y == 'IgG' &
    antigen.y == 'gp41'
)
## # A tibble: 9 x 10
##   visitno.x type.x antigen.x visitno.y type.y antigen.y statistic p.value
##       <int> <chr>  <chr>         <int> <chr>  <chr>         <dbl>   <dbl>
## 1         9 IgG    gp41              9 IgG    gp41            420  0.236 
## 2         9 IgG    gp41             12 IgG    gp41            378  0.247 
## 3         9 IgG    gp41              2 IgG    gp41            420  0.236 
## 4        12 IgG    gp41              9 IgG    gp41            378  0.247 
## 5        12 IgG    gp41             12 IgG    gp41            378  0.0207
## 6        12 IgG    gp41              2 IgG    gp41            378  0.247 
## 7         2 IgG    gp41              9 IgG    gp41            420  0.236 
## 8         2 IgG    gp41             12 IgG    gp41            378  0.247 
## 9         2 IgG    gp41              2 IgG    gp41            420  0.236 
## # ... with 2 more variables: parameter <int>, method <chr>
compareImmuneX2 %>%
filter(type.x == 'IgG' & type.y == 'IgG' & antigen.x != antigen.y) %>%
write_csv('tables/chisq_IgG_comparasons.csv')

MDS GLM for each other MDS Axis

psN2 %>% phylo_join(scores(psN2.pcoa, display = "sites", choices = 1:10) %>%
                    as.data.frame %>% rownames_to_column, by = 'rowname') %>%
sample_data %>% as('data.frame') %>% rownames_to_column -> hereSam
data_frame(formula = paste("transformation(ydata) ~ MDS", 1:10, sep = ""))
## # A tibble: 10 x 1
##    formula                      
##    <chr>                        
##  1 transformation(ydata) ~ MDS1 
##  2 transformation(ydata) ~ MDS2 
##  3 transformation(ydata) ~ MDS3 
##  4 transformation(ydata) ~ MDS4 
##  5 transformation(ydata) ~ MDS5 
##  6 transformation(ydata) ~ MDS6 
##  7 transformation(ydata) ~ MDS7 
##  8 transformation(ydata) ~ MDS8 
##  9 transformation(ydata) ~ MDS9 
## 10 transformation(ydata) ~ MDS10
EachMDS <- function(x, nperm = 9999, transformation = medcode2, family = 'binomial'){
    ## Pull out the needed data
    
    psN2.wMDS <- psN2 %>% phylo_join(scores(psN2.pcoa, display = "sites", choices = 1:10) %>%
                    as.data.frame %>% rownames_to_column, by = 'rowname')
    
#     medWuf <- NA
#     rankWuf <- NA
    locPS <- phylo_join(psN2.wMDS, x, by = 'pub_id') 
    ydata0 <- sample_data(locPS)$mag
    yna <- is.na(ydata0)
    #loc.wuf <- wufKN2
    #loc.jsd <- jsdKN2
    ydata <- ydata0
    
    ydata <- ydata0[!yna]
    loc.wuf2 <- psN2.wuf %>% as.matrix %>% .[!yna, !yna]
    
     samDf <- locPS %>% sample_data %>% as('data.frame') %>% rownames_to_column %>%
    .[!yna,]

#     # Is giving only positive results with CAP1, not sure why
    loc_glm <- glm(as.formula("transformation(ydata) ~  MDS1"), data = samDf, family = family)
    glmAnova <- loc_glm %>% anova(test = "Chisq")
    
    # data_frame, rather than data.frame
    # https://stackoverflow.com/questions/48450308/iterating-over-formulas-in-purrr#48450308
    data_frame(formulaString = paste("transformation(ydata) ~ MDS", 1:10, sep = "")) %>%
     mutate(model = map(formulaString, function(fs){
         glm(as.formula(fs), data = samDf, family = family)})) %>%
    mutate(anova = map(model, anova)) %>%
    mutate(glance = map(model, glance)) %>%
    mutate(tidy = map(model, tidy)) %>%
    mutate(coef = map(model, ~ coef(summary(.))[2,])) %>%
    pass -> allmodels

    allmodels %>% dplyr::select("tidy") %>% unnest %>% filter(term != '(Intercept)')
    
 
    }
# Just confirming that the function works before it goes in a giant loop. I'd delete this,
# but i'll just end up needing it again if I do.
ptm = proc.time()
tps <- EachMDS(test.immune.pteg, nperm = 9999, transformation = medcode, family = 'binomial')
proc.time() - ptm
##    user  system elapsed 
##   0.145   0.000   0.145
tps
## # A tibble: 10 x 5
##    term  estimate std.error statistic p.value
##    <chr>    <dbl>     <dbl>     <dbl>   <dbl>
##  1 MDS1    -0.937     0.793    -1.18   0.237 
##  2 MDS2     0.447     0.758     0.590  0.555 
##  3 MDS3    -0.214     0.732    -0.292  0.771 
##  4 MDS4    -1.46      0.908    -1.61   0.108 
##  5 MDS5    -0.997     0.844    -1.18   0.237 
##  6 MDS6    -1.47      0.907    -1.62   0.104 
##  7 MDS7     0.267     0.736     0.363  0.717 
##  8 MDS8    -1.74      1.01     -1.73   0.0838
##  9 MDS9     0.360     0.741     0.487  0.627 
## 10 MDS10    0.784     0.787     0.996  0.319
use.immune %>%
group_by(type, antigen, month) %>%
nest %>%
mutate(coefs = map(data, ~ EachMDS(.))) %>%
dplyr::select(-data) %>% unnest(coefs) -> glmMDScoefs
ants1
## [1] "Con.6.gp120.B"      "ZM96.gp140"         "gp70_B.CaseA_V1_V2"
glmMDScoefs %>%
gather(key = "key", value = "value", estimate:p.value) %>%
filter(key == "p.value") %>%
spread(key = term, value = value) %>%
dplyr::select(-key, -MDS10, MDS10) %>%
dplyr::rename(Type = type, Antigen = antigen, Month = month) %>%
mutate(Type = factor(Type, levels = c( "IgA", "IgG",  "CD4+"))) %>%
mutate(Antigen = factor(Antigen, levels = c(ants1, ants2, "ANY.ENV.PTEG"))) %>%
#Clean up labels
mutate(Antigen = stringr::str_replace_all(Antigen, "_", " ")) %>%
mutate(Antigen = stringr::str_replace_all(Antigen, "V1 V2", "V1-V2")) %>%
mutate(Antigen = stringr::str_replace_all(Antigen, "ANY.ENV.PTEG", "Any ENV PTEG")) %>%
arrange(Type) -> allMDS
allMDS
## # A tibble: 20 x 13
##    Type  Antigen Month   MDS1  MDS2  MDS3   MDS4  MDS5   MDS6   MDS7   MDS8
##    <fct> <chr>   <dbl>  <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl>  <dbl>  <dbl>
##  1 IgA   gp41      0   0.653  0.607 0.401 0.702  0.786 0.650  0.242  0.349 
##  2 IgA   gp41      6.5 0.168  0.368 0.701 0.954  0.112 0.142  0.456  0.475 
##  3 IgA   gp41     12   0.855  0.201 0.462 0.408  0.994 0.798  0.469  0.106 
##  4 IgA   p24       0   0.335  0.629 0.520 0.939  0.666 0.657  0.148  0.853 
##  5 IgA   p24       6.5 0.652  0.281 0.681 0.414  0.438 0.599  0.999  0.199 
##  6 IgA   p24      12   0.398  0.724 0.227 0.202  0.128 0.465  0.445  0.156 
##  7 IgG   Con.6.…   6.5 0.0175 0.536 0.420 0.890  0.325 0.427  0.853  0.806 
##  8 IgG   Con.6.…  12   0.0315 0.878 0.377 0.635  0.989 0.540  0.569  0.374 
##  9 IgG   gp41      0   0.0399 0.894 0.304 0.967  0.442 0.231  0.504  0.595 
## 10 IgG   gp41      6.5 0.0567 0.274 0.226 0.891  0.858 0.165  0.601  0.426 
## 11 IgG   gp41     12   0.779  0.310 0.443 0.228  0.680 0.268  0.995  0.0391
## 12 IgG   gp70 B…   6.5 0.621  0.853 0.300 0.458  0.507 0.690  0.506  0.161 
## 13 IgG   gp70 B…  12   0.0373 0.665 0.808 0.447  0.318 0.296  0.743  0.143 
## 14 IgG   p24       0   0.451  0.231 0.986 0.0876 0.149 0.0510 0.882  0.507 
## 15 IgG   p24       6.5 0.404  0.828 0.243 0.250  0.689 0.0819 0.180  0.846 
## 16 IgG   p24      12   0.182  0.721 0.237 0.676  0.933 0.371  0.0418 0.423 
## 17 IgG   ZM96.g…   6.5 0.0299 0.631 0.244 0.750  0.234 0.949  0.890  0.994 
## 18 IgG   ZM96.g…  12   0.186  0.353 0.286 0.722  0.889 0.409  0.459  0.376 
## 19 CD4+  Any EN…   6.5 0.237  0.555 0.771 0.108  0.237 0.104  0.717  0.0838
## 20 CD4+  Any EN…  12   0.248  0.565 0.434 0.103  0.117 0.856  0.861  0.128 
## # ... with 2 more variables: MDS9 <dbl>, MDS10 <dbl>

Table S2

allMDS %>%
kable("html", escape = F, digits = 3, align = 'c') %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
as.character() -> allMDS.html

allMDS %>%
kable("html", escape = F, digits = 3, align = 'c') %>%
collapse_rows(columns = 1:2, latex_hline = "full")
Type Antigen Month MDS1 MDS2 MDS3 MDS4 MDS5 MDS6 MDS7 MDS8 MDS9 MDS10
IgA gp41 0.0 0.653 0.607 0.401 0.702 0.786 0.650 0.242 0.349 0.038 0.700
6.5 0.168 0.368 0.701 0.954 0.112 0.142 0.456 0.475 0.789 0.864
12.0 0.855 0.201 0.462 0.408 0.994 0.798 0.469 0.106 0.079 0.232
p24 0.0 0.335 0.629 0.520 0.939 0.666 0.657 0.148 0.853 0.921 0.517
6.5 0.652 0.281 0.681 0.414 0.438 0.599 0.999 0.199 0.881 0.351
12.0 0.398 0.724 0.227 0.202 0.128 0.465 0.445 0.156 0.516 0.159
IgG Con.6.gp120.B 6.5 0.017 0.536 0.420 0.890 0.325 0.427 0.853 0.806 0.634 0.250
12.0 0.031 0.878 0.377 0.635 0.989 0.540 0.569 0.374 0.686 0.299
gp41 0.0 0.040 0.894 0.304 0.967 0.442 0.231 0.504 0.595 0.856 0.509
6.5 0.057 0.274 0.226 0.891 0.858 0.165 0.601 0.426 0.212 0.606
12.0 0.779 0.310 0.443 0.228 0.680 0.268 0.995 0.039 0.357 0.054
gp70 B.CaseA V1-V2 6.5 0.621 0.853 0.300 0.458 0.507 0.690 0.506 0.161 0.039 0.880
12.0 0.037 0.665 0.808 0.447 0.318 0.296 0.743 0.143 0.403 0.276
p24 0.0 0.451 0.231 0.986 0.088 0.149 0.051 0.882 0.507 0.384 0.439
6.5 0.404 0.828 0.243 0.250 0.689 0.082 0.180 0.846 0.360 0.101
12.0 0.182 0.721 0.237 0.676 0.933 0.371 0.042 0.423 0.716 0.147
ZM96.gp140 6.5 0.030 0.631 0.244 0.750 0.234 0.949 0.890 0.994 0.090 0.504
12.0 0.186 0.353 0.286 0.722 0.889 0.409 0.459 0.376 0.021 0.189
CD4+ Any ENV PTEG 6.5 0.237 0.555 0.771 0.108 0.237 0.104 0.717 0.084 0.627 0.319
12.0 0.248 0.565 0.434 0.103 0.117 0.856 0.861 0.128 0.773 0.723
allMDS.html
## [1] "<table>\n <thead>\n  <tr>\n   <th style=\"text-align:center;\"> Type </th>\n   <th style=\"text-align:center;\"> Antigen </th>\n   <th style=\"text-align:center;\"> Month </th>\n   <th style=\"text-align:center;\"> MDS1 </th>\n   <th style=\"text-align:center;\"> MDS2 </th>\n   <th style=\"text-align:center;\"> MDS3 </th>\n   <th style=\"text-align:center;\"> MDS4 </th>\n   <th style=\"text-align:center;\"> MDS5 </th>\n   <th style=\"text-align:center;\"> MDS6 </th>\n   <th style=\"text-align:center;\"> MDS7 </th>\n   <th style=\"text-align:center;\"> MDS8 </th>\n   <th style=\"text-align:center;\"> MDS9 </th>\n   <th style=\"text-align:center;\"> MDS10 </th>\n  </tr>\n </thead>\n<tbody>\n  <tr>\n   <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"6\"> IgA </td>\n   <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"3\"> gp41 </td>\n   <td style=\"text-align:center;\"> 0.0 </td>\n   <td style=\"text-align:center;\"> 0.653 </td>\n   <td style=\"text-align:center;\"> 0.607 </td>\n   <td style=\"text-align:center;\"> 0.401 </td>\n   <td style=\"text-align:center;\"> 0.702 </td>\n   <td style=\"text-align:center;\"> 0.786 </td>\n   <td style=\"text-align:center;\"> 0.650 </td>\n   <td style=\"text-align:center;\"> 0.242 </td>\n   <td style=\"text-align:center;\"> 0.349 </td>\n   <td style=\"text-align:center;\"> 0.038 </td>\n   <td style=\"text-align:center;\"> 0.700 </td>\n  </tr>\n  <tr>\n   \n   \n   <td style=\"text-align:center;\"> 6.5 </td>\n   <td style=\"text-align:center;\"> 0.168 </td>\n   <td style=\"text-align:center;\"> 0.368 </td>\n   <td style=\"text-align:center;\"> 0.701 </td>\n   <td style=\"text-align:center;\"> 0.954 </td>\n   <td style=\"text-align:center;\"> 0.112 </td>\n   <td style=\"text-align:center;\"> 0.142 </td>\n   <td style=\"text-align:center;\"> 0.456 </td>\n   <td style=\"text-align:center;\"> 0.475 </td>\n   <td style=\"text-align:center;\"> 0.789 </td>\n   <td style=\"text-align:center;\"> 0.864 </td>\n  </tr>\n  <tr>\n   \n   \n   <td style=\"text-align:center;\"> 12.0 </td>\n   <td style=\"text-align:center;\"> 0.855 </td>\n   <td style=\"text-align:center;\"> 0.201 </td>\n   <td style=\"text-align:center;\"> 0.462 </td>\n   <td style=\"text-align:center;\"> 0.408 </td>\n   <td style=\"text-align:center;\"> 0.994 </td>\n   <td style=\"text-align:center;\"> 0.798 </td>\n   <td style=\"text-align:center;\"> 0.469 </td>\n   <td style=\"text-align:center;\"> 0.106 </td>\n   <td style=\"text-align:center;\"> 0.079 </td>\n   <td style=\"text-align:center;\"> 0.232 </td>\n  </tr>\n  <tr>\n   \n   <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"3\"> p24 </td>\n   <td style=\"text-align:center;\"> 0.0 </td>\n   <td style=\"text-align:center;\"> 0.335 </td>\n   <td style=\"text-align:center;\"> 0.629 </td>\n   <td style=\"text-align:center;\"> 0.520 </td>\n   <td style=\"text-align:center;\"> 0.939 </td>\n   <td style=\"text-align:center;\"> 0.666 </td>\n   <td style=\"text-align:center;\"> 0.657 </td>\n   <td style=\"text-align:center;\"> 0.148 </td>\n   <td style=\"text-align:center;\"> 0.853 </td>\n   <td style=\"text-align:center;\"> 0.921 </td>\n   <td style=\"text-align:center;\"> 0.517 </td>\n  </tr>\n  <tr>\n   \n   \n   <td style=\"text-align:center;\"> 6.5 </td>\n   <td style=\"text-align:center;\"> 0.652 </td>\n   <td style=\"text-align:center;\"> 0.281 </td>\n   <td style=\"text-align:center;\"> 0.681 </td>\n   <td style=\"text-align:center;\"> 0.414 </td>\n   <td style=\"text-align:center;\"> 0.438 </td>\n   <td style=\"text-align:center;\"> 0.599 </td>\n   <td style=\"text-align:center;\"> 0.999 </td>\n   <td style=\"text-align:center;\"> 0.199 </td>\n   <td style=\"text-align:center;\"> 0.881 </td>\n   <td style=\"text-align:center;\"> 0.351 </td>\n  </tr>\n  <tr>\n   \n   \n   <td style=\"text-align:center;\"> 12.0 </td>\n   <td style=\"text-align:center;\"> 0.398 </td>\n   <td style=\"text-align:center;\"> 0.724 </td>\n   <td style=\"text-align:center;\"> 0.227 </td>\n   <td style=\"text-align:center;\"> 0.202 </td>\n   <td style=\"text-align:center;\"> 0.128 </td>\n   <td style=\"text-align:center;\"> 0.465 </td>\n   <td style=\"text-align:center;\"> 0.445 </td>\n   <td style=\"text-align:center;\"> 0.156 </td>\n   <td style=\"text-align:center;\"> 0.516 </td>\n   <td style=\"text-align:center;\"> 0.159 </td>\n  </tr>\n  <tr>\n   <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"12\"> IgG </td>\n   <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> Con.6.gp120.B </td>\n   <td style=\"text-align:center;\"> 6.5 </td>\n   <td style=\"text-align:center;\"> 0.017 </td>\n   <td style=\"text-align:center;\"> 0.536 </td>\n   <td style=\"text-align:center;\"> 0.420 </td>\n   <td style=\"text-align:center;\"> 0.890 </td>\n   <td style=\"text-align:center;\"> 0.325 </td>\n   <td style=\"text-align:center;\"> 0.427 </td>\n   <td style=\"text-align:center;\"> 0.853 </td>\n   <td style=\"text-align:center;\"> 0.806 </td>\n   <td style=\"text-align:center;\"> 0.634 </td>\n   <td style=\"text-align:center;\"> 0.250 </td>\n  </tr>\n  <tr>\n   \n   \n   <td style=\"text-align:center;\"> 12.0 </td>\n   <td style=\"text-align:center;\"> 0.031 </td>\n   <td style=\"text-align:center;\"> 0.878 </td>\n   <td style=\"text-align:center;\"> 0.377 </td>\n   <td style=\"text-align:center;\"> 0.635 </td>\n   <td style=\"text-align:center;\"> 0.989 </td>\n   <td style=\"text-align:center;\"> 0.540 </td>\n   <td style=\"text-align:center;\"> 0.569 </td>\n   <td style=\"text-align:center;\"> 0.374 </td>\n   <td style=\"text-align:center;\"> 0.686 </td>\n   <td style=\"text-align:center;\"> 0.299 </td>\n  </tr>\n  <tr>\n   \n   <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"3\"> gp41 </td>\n   <td style=\"text-align:center;\"> 0.0 </td>\n   <td style=\"text-align:center;\"> 0.040 </td>\n   <td style=\"text-align:center;\"> 0.894 </td>\n   <td style=\"text-align:center;\"> 0.304 </td>\n   <td style=\"text-align:center;\"> 0.967 </td>\n   <td style=\"text-align:center;\"> 0.442 </td>\n   <td style=\"text-align:center;\"> 0.231 </td>\n   <td style=\"text-align:center;\"> 0.504 </td>\n   <td style=\"text-align:center;\"> 0.595 </td>\n   <td style=\"text-align:center;\"> 0.856 </td>\n   <td style=\"text-align:center;\"> 0.509 </td>\n  </tr>\n  <tr>\n   \n   \n   <td style=\"text-align:center;\"> 6.5 </td>\n   <td style=\"text-align:center;\"> 0.057 </td>\n   <td style=\"text-align:center;\"> 0.274 </td>\n   <td style=\"text-align:center;\"> 0.226 </td>\n   <td style=\"text-align:center;\"> 0.891 </td>\n   <td style=\"text-align:center;\"> 0.858 </td>\n   <td style=\"text-align:center;\"> 0.165 </td>\n   <td style=\"text-align:center;\"> 0.601 </td>\n   <td style=\"text-align:center;\"> 0.426 </td>\n   <td style=\"text-align:center;\"> 0.212 </td>\n   <td style=\"text-align:center;\"> 0.606 </td>\n  </tr>\n  <tr>\n   \n   \n   <td style=\"text-align:center;\"> 12.0 </td>\n   <td style=\"text-align:center;\"> 0.779 </td>\n   <td style=\"text-align:center;\"> 0.310 </td>\n   <td style=\"text-align:center;\"> 0.443 </td>\n   <td style=\"text-align:center;\"> 0.228 </td>\n   <td style=\"text-align:center;\"> 0.680 </td>\n   <td style=\"text-align:center;\"> 0.268 </td>\n   <td style=\"text-align:center;\"> 0.995 </td>\n   <td style=\"text-align:center;\"> 0.039 </td>\n   <td style=\"text-align:center;\"> 0.357 </td>\n   <td style=\"text-align:center;\"> 0.054 </td>\n  </tr>\n  <tr>\n   \n   <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> gp70 B.CaseA V1-V2 </td>\n   <td style=\"text-align:center;\"> 6.5 </td>\n   <td style=\"text-align:center;\"> 0.621 </td>\n   <td style=\"text-align:center;\"> 0.853 </td>\n   <td style=\"text-align:center;\"> 0.300 </td>\n   <td style=\"text-align:center;\"> 0.458 </td>\n   <td style=\"text-align:center;\"> 0.507 </td>\n   <td style=\"text-align:center;\"> 0.690 </td>\n   <td style=\"text-align:center;\"> 0.506 </td>\n   <td style=\"text-align:center;\"> 0.161 </td>\n   <td style=\"text-align:center;\"> 0.039 </td>\n   <td style=\"text-align:center;\"> 0.880 </td>\n  </tr>\n  <tr>\n   \n   \n   <td style=\"text-align:center;\"> 12.0 </td>\n   <td style=\"text-align:center;\"> 0.037 </td>\n   <td style=\"text-align:center;\"> 0.665 </td>\n   <td style=\"text-align:center;\"> 0.808 </td>\n   <td style=\"text-align:center;\"> 0.447 </td>\n   <td style=\"text-align:center;\"> 0.318 </td>\n   <td style=\"text-align:center;\"> 0.296 </td>\n   <td style=\"text-align:center;\"> 0.743 </td>\n   <td style=\"text-align:center;\"> 0.143 </td>\n   <td style=\"text-align:center;\"> 0.403 </td>\n   <td style=\"text-align:center;\"> 0.276 </td>\n  </tr>\n  <tr>\n   \n   <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"3\"> p24 </td>\n   <td style=\"text-align:center;\"> 0.0 </td>\n   <td style=\"text-align:center;\"> 0.451 </td>\n   <td style=\"text-align:center;\"> 0.231 </td>\n   <td style=\"text-align:center;\"> 0.986 </td>\n   <td style=\"text-align:center;\"> 0.088 </td>\n   <td style=\"text-align:center;\"> 0.149 </td>\n   <td style=\"text-align:center;\"> 0.051 </td>\n   <td style=\"text-align:center;\"> 0.882 </td>\n   <td style=\"text-align:center;\"> 0.507 </td>\n   <td style=\"text-align:center;\"> 0.384 </td>\n   <td style=\"text-align:center;\"> 0.439 </td>\n  </tr>\n  <tr>\n   \n   \n   <td style=\"text-align:center;\"> 6.5 </td>\n   <td style=\"text-align:center;\"> 0.404 </td>\n   <td style=\"text-align:center;\"> 0.828 </td>\n   <td style=\"text-align:center;\"> 0.243 </td>\n   <td style=\"text-align:center;\"> 0.250 </td>\n   <td style=\"text-align:center;\"> 0.689 </td>\n   <td style=\"text-align:center;\"> 0.082 </td>\n   <td style=\"text-align:center;\"> 0.180 </td>\n   <td style=\"text-align:center;\"> 0.846 </td>\n   <td style=\"text-align:center;\"> 0.360 </td>\n   <td style=\"text-align:center;\"> 0.101 </td>\n  </tr>\n  <tr>\n   \n   \n   <td style=\"text-align:center;\"> 12.0 </td>\n   <td style=\"text-align:center;\"> 0.182 </td>\n   <td style=\"text-align:center;\"> 0.721 </td>\n   <td style=\"text-align:center;\"> 0.237 </td>\n   <td style=\"text-align:center;\"> 0.676 </td>\n   <td style=\"text-align:center;\"> 0.933 </td>\n   <td style=\"text-align:center;\"> 0.371 </td>\n   <td style=\"text-align:center;\"> 0.042 </td>\n   <td style=\"text-align:center;\"> 0.423 </td>\n   <td style=\"text-align:center;\"> 0.716 </td>\n   <td style=\"text-align:center;\"> 0.147 </td>\n  </tr>\n  <tr>\n   \n   <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> ZM96.gp140 </td>\n   <td style=\"text-align:center;\"> 6.5 </td>\n   <td style=\"text-align:center;\"> 0.030 </td>\n   <td style=\"text-align:center;\"> 0.631 </td>\n   <td style=\"text-align:center;\"> 0.244 </td>\n   <td style=\"text-align:center;\"> 0.750 </td>\n   <td style=\"text-align:center;\"> 0.234 </td>\n   <td style=\"text-align:center;\"> 0.949 </td>\n   <td style=\"text-align:center;\"> 0.890 </td>\n   <td style=\"text-align:center;\"> 0.994 </td>\n   <td style=\"text-align:center;\"> 0.090 </td>\n   <td style=\"text-align:center;\"> 0.504 </td>\n  </tr>\n  <tr>\n   \n   \n   <td style=\"text-align:center;\"> 12.0 </td>\n   <td style=\"text-align:center;\"> 0.186 </td>\n   <td style=\"text-align:center;\"> 0.353 </td>\n   <td style=\"text-align:center;\"> 0.286 </td>\n   <td style=\"text-align:center;\"> 0.722 </td>\n   <td style=\"text-align:center;\"> 0.889 </td>\n   <td style=\"text-align:center;\"> 0.409 </td>\n   <td style=\"text-align:center;\"> 0.459 </td>\n   <td style=\"text-align:center;\"> 0.376 </td>\n   <td style=\"text-align:center;\"> 0.021 </td>\n   <td style=\"text-align:center;\"> 0.189 </td>\n  </tr>\n  <tr>\n   <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> CD4+ </td>\n   <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> Any ENV PTEG </td>\n   <td style=\"text-align:center;\"> 6.5 </td>\n   <td style=\"text-align:center;\"> 0.237 </td>\n   <td style=\"text-align:center;\"> 0.555 </td>\n   <td style=\"text-align:center;\"> 0.771 </td>\n   <td style=\"text-align:center;\"> 0.108 </td>\n   <td style=\"text-align:center;\"> 0.237 </td>\n   <td style=\"text-align:center;\"> 0.104 </td>\n   <td style=\"text-align:center;\"> 0.717 </td>\n   <td style=\"text-align:center;\"> 0.084 </td>\n   <td style=\"text-align:center;\"> 0.627 </td>\n   <td style=\"text-align:center;\"> 0.319 </td>\n  </tr>\n  <tr>\n   \n   \n   <td style=\"text-align:center;\"> 12.0 </td>\n   <td style=\"text-align:center;\"> 0.248 </td>\n   <td style=\"text-align:center;\"> 0.565 </td>\n   <td style=\"text-align:center;\"> 0.434 </td>\n   <td style=\"text-align:center;\"> 0.103 </td>\n   <td style=\"text-align:center;\"> 0.117 </td>\n   <td style=\"text-align:center;\"> 0.856 </td>\n   <td style=\"text-align:center;\"> 0.861 </td>\n   <td style=\"text-align:center;\"> 0.128 </td>\n   <td style=\"text-align:center;\"> 0.773 </td>\n   <td style=\"text-align:center;\"> 0.723 </td>\n  </tr>\n</tbody>\n</table>"
allMDS.html %>% cat(file = 'tables/allMDS.html')
allMDS %>%
kable("latex", escape = F, digits = 3, align = 'c', booktabs = T) %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass -> allMDS.latex


allMDS.latex %>% cat(file = 'tables/allMDS.tex')
# Print latex table to tex file
cat(docHead, allMDS.latex, docTail, file = 'tables/allMDS.tex')
write_csv(allMDS, 'tables/allMDSGlmPValues.csv')

Jensen Shannon Kernel Regression

at each taxonomic level

Agglomeration

# How many taxa do we see if we agglomerate at different levels
psN2 %>% tax_table %>% as.data.frame %>% dplyr::select(Phylum:Genus) %>% colnames -> taxLevels

data_frame(taxLevels) %>%
mutate(ntaxa = map(taxLevels,
    function(lev){
        psN2 %>% tax_glom(lev) %>% ntaxa
    }
                                             )) %>%
mutate(ntaxa = unlist(ntaxa)) %>%
pass -> NTaxaAtLevel
NTaxaAtLevel
## # A tibble: 5 x 2
##   taxLevels ntaxa
##   <chr>     <int>
## 1 Phylum        5
## 2 Class        12
## 3 Order        17
## 4 Family       39
## 5 Genus       104
data_frame(taxLevels = "Species", ntaxa = ntaxa(psN2), ps = list((psN2))) -> specRow
data_frame(taxLevels = "Species", ntaxa = ntaxa(psN1), psCount = list((psN1))) -> specRowC
D2K_savename <- function(distmat){
    # cascade names forward with the D2K operation
    require(MiRKAT)
    out <- MiRKAT::D2K(distmat)
    colnames(out) <- colnames(distmat)
    rownames(out) <- rownames(distmat)
    out
}
# Data frame of phyloseq objects distances and kernels at a bunch of taxonomic levels
NTaxaAtLevel %>%
mutate(ps = map(ntaxa, ~tip_glom_saveid(psN2, k = .))) %>%
# process the phyloseq objects so they have better names
mutate(ps = map(ps, ~swap.phyloseq.taxnames(tag_phyloseq(remove_tag_phyloseq(.)), oldname = 'oldname2'))) %>%
# add in the species data row (which should already have correct names)
bind_rows(specRow) %>%
# calculate jensen-shannon distance matrix
mutate(jsd = map(ps, ~phyloseq::distance(., method = "jsd") )) %>%
# convert to 2d matrix
mutate(jsdMat = map(jsd, ~as.matrix(.))) %>%
# calculate kernel
mutate(kjsd = map(jsdMat, ~D2K_savename(.))) -> tmp
## Loading required package: cluster
## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.
tmp %>%
mutate(psNoZero = map(ps, ~transform_sample_counts(., function(x) x+(1/1000)))) -> tmp

tmp %>%
## chemometrics::clr just works, while compositions::clr throws a criptic error message here
mutate(clr = map(psNoZero, ~ transform_otu_table(., chemometrics::clr))) %>%
#mutate(clr = map(psNoZero, ~ transform_otu_table(., function(x) as.matrix(compositions::clr(x))))) %>%
pass -> psDf0 # Original way
# Data frame of phyloseq objects distances and kernels at a bunch of taxonomic levels
# I use psN1 because I need count data for some downstream steps.
NTaxaAtLevel %>%
mutate(psCount = map(ntaxa, ~tip_glom_saveid(psN1, k = .))) %>%
# process the phyloseq objects so they have better names
mutate(psCount = map(psCount, ~swap.phyloseq.taxnames(tag_phyloseq(remove_tag_phyloseq(.)), oldname = 'oldname2'))) %>%
# add in the species data row (which should already have correct names)
bind_rows(specRowC) %>%
pass -> tmp
## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.
tmp %>%
# calculate jensen-shannon distance matrix
mutate(ps = map(psCount, ~transform_sample_counts(., function(x) {x/sum(x)}))) %>%
mutate(jsd = map(ps, ~phyloseq::distance(., method = "jsd") )) %>%
# convert to 2d matrix
mutate(jsdMat = map(jsd, ~as.matrix(.))) %>%
# calculate kernel
mutate(kjsd = map(jsdMat, ~D2K_savename(.))) -> tmp

tmp %>%
mutate(psNoZero = map(ps, ~transform_sample_counts(., function(x) x+(1/1000)))) %>%
## chemometrics::clr just works, while compositions::clr throws a criptic error message here
mutate(clr = map(psNoZero, ~ transform_otu_table(., chemometrics::clr))) %>%
#mutate(clr = map(psNoZero, ~ transform_otu_table(., function(x) as.matrix(compositions::clr(x))))) %>%
pass -> psDf
print(psDf)
## # A tibble: 6 x 9
##   taxLevels ntaxa psCount   ps      jsd    jsdMat   kjsd   psNoZero clr   
##   <chr>     <int> <list>    <list>  <list> <list>   <list> <list>   <list>
## 1 Phylum        5 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 2 Class        12 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 3 Order        17 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 4 Family       39 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 5 Genus       104 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 6 Species     651 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
MirMulti <- function(x, KsDf = psDf, ps = psN2, nperm = 9999){
    
    Ks = KsDf$kjsd
    
    # I  bind to the phyloseq object and then peel off again later to guerentee
    # that the y-data is in the same order as the Ks
    locPS <- phylo_join(ps, x, by = 'pub_id')
    
    ydata0 <- sample_data(locPS)$mag
    yna <- is.na(ydata0)

    ydata <- ydata0[!yna]
    loc.Ks <- lapply(Ks, function(K){K[!yna, !yna]})  
  
    bcxJSD <- MiRKAT(y = jac_box_cox_2(ydata), Ks = loc.Ks, out_type = "C", method = 'permutation', nperm = nperm)
    medJSD <- MiRKAT(y = medcode(ydata), Ks = loc.Ks, out_type = "D", method = 'permutation', nperm = nperm)
    mmDf = data.frame(
        taxLevels = KsDf$taxLevels,
        ntaxa = KsDf$ntaxa,
        bcxJSD = bcxJSD$indivP, medJSD = medJSD$indivP,
        bcxJSDOmni = bcxJSD$omnibus_p, medJSDOmni = medJSD$omnibus_p)
    mmDf
    
    }
# Test case

use.immune %>%
filter(type == 'IgG' & antigen == 'gp41'& visitno == 2 & ct == 'T') -> test.immune1

test.mm <- MirMulti(test.immune1, Ks = psDf, nperm = 999)

test.mm
##   taxLevels ntaxa bcxJSD medJSD bcxJSDOmni medJSDOmni
## 1    Phylum     5  0.217  0.228      0.162     0.0435
## 2     Class    12  0.282  0.228      0.162     0.0435
## 3     Order    17  0.048  0.011      0.162     0.0435
## 4    Family    39  0.123  0.031      0.162     0.0435
## 5     Genus   104  0.247  0.038      0.162     0.0435
## 6   Species   651  0.505  0.131      0.162     0.0435
ptm = proc.time()

use.immune %>%
group_by(type, antigen, month) %>%
nest %>%
mutate(mir = map(data,
    ~MirMulti(., Ks = psDf, ps = psN2, nperm = 999)
)) %>%
dplyr::select(-data) %>% unnest(mir) %>%
pass -> mirLevels

proc.time() - ptm
##    user  system elapsed 
##   2.561   0.000   2.561
mirLevels %>% dplyr::select(-ntaxa, -medJSD) %>% spread(key = taxLevels, value = bcxJSD)
## # A tibble: 20 x 11
##    type  antigen month bcxJSDOmni medJSDOmni  Class Family  Genus  Order
##    <chr> <chr>   <dbl>      <dbl>      <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
##  1 CD4+  ANY.EN…   6.5     0.762      0.285  0.497  0.512  0.704  0.396 
##  2 CD4+  ANY.EN…  12       0.054      0.148  0.014  0.146  0.136  0.057 
##  3 IgA   gp41      0       0.995      0.864  0.997  0.875  0.916  0.953 
##  4 IgA   gp41      6.5     0.259      0.362  0.316  0.546  0.401  0.395 
##  5 IgA   gp41     12       0.525      0.916  0.878  0.419  0.349  0.712 
##  6 IgA   p24       0       0.886      0.388  0.921  0.815  0.897  0.917 
##  7 IgA   p24       6.5     0.982      0.830  0.905  0.933  0.8    0.852 
##  8 IgA   p24      12       0.444      0.102  0.167  0.604  0.639  0.298 
##  9 IgG   Con.6.…   6.5     0.388      0.052  0.233  0.178  0.178  0.208 
## 10 IgG   Con.6.…  12       0.003      0.053  0.04   0.002  0.001  0.002 
## 11 IgG   gp41      0       0.158      0.0365 0.269  0.124  0.229  0.047 
## 12 IgG   gp41      6.5     0.334      0.0765 0.115  0.26   0.148  0.131 
## 13 IgG   gp41     12       0.310      0.722  0.156  0.3    0.179  0.135 
## 14 IgG   gp70_B…   6.5     0.526      0.949  0.217  0.415  0.429  0.376 
## 15 IgG   gp70_B…  12       0.114      0.086  0.169  0.0720 0.0920 0.032 
## 16 IgG   p24       0       0.893      0.476  0.718  0.951  0.611  0.913 
## 17 IgG   p24       6.5     0.0185     0.853  0.005  0.051  0.028  0.047 
## 18 IgG   p24      12       0.782      0.696  0.971  0.469  0.558  0.41  
## 19 IgG   ZM96.g…   6.5     0.097      0.0405 0.0860 0.113  0.057  0.124 
## 20 IgG   ZM96.g…  12       0.280      0.350  0.464  0.184  0.105  0.0960
## # ... with 2 more variables: Phylum <dbl>, Species <dbl>
mirLevels %>% dplyr::select(-ntaxa, -bcxJSD) %>% spread(key = taxLevels, value = medJSD)
## # A tibble: 20 x 11
##    type  antigen month bcxJSDOmni medJSDOmni  Class Family  Genus  Order
##    <chr> <chr>   <dbl>      <dbl>      <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
##  1 CD4+  ANY.EN…   6.5     0.762      0.285  0.102   0.193 0.268  0.0970
##  2 CD4+  ANY.EN…  12       0.054      0.148  0.047   0.164 0.184  0.061 
##  3 IgA   gp41      0       0.995      0.864  0.855   0.824 0.905  0.917 
##  4 IgA   gp41      6.5     0.259      0.362  0.652   0.485 0.43   0.364 
##  5 IgA   gp41     12       0.525      0.916  0.918   0.683 0.647  0.818 
##  6 IgA   p24       0       0.886      0.388  0.92    0.559 0.791  0.847 
##  7 IgA   p24       6.5     0.982      0.830  0.845   0.831 0.694  0.556 
##  8 IgA   p24      12       0.444      0.102  0.028   0.24  0.401  0.114 
##  9 IgG   Con.6.…   6.5     0.388      0.052  0.0830  0.019 0.014  0.015 
## 10 IgG   Con.6.…  12       0.003      0.053  0.0990  0.038 0.042  0.015 
## 11 IgG   gp41      0       0.158      0.0365 0.211   0.03  0.032  0.01  
## 12 IgG   gp41      6.5     0.334      0.0765 0.059   0.06  0.0690 0.021 
## 13 IgG   gp41     12       0.310      0.722  0.905   0.747 0.332  0.698 
## 14 IgG   gp70_B…   6.5     0.526      0.949  0.723   0.949 0.974  0.906 
## 15 IgG   gp70_B…  12       0.114      0.086  0.525   0.044 0.025  0.023 
## 16 IgG   p24       0       0.893      0.476  0.397   0.287 0.182  0.752 
## 17 IgG   p24       6.5     0.0185     0.853  0.479   0.673 0.502  0.69  
## 18 IgG   p24      12       0.782      0.696  0.745   0.329 0.374  0.348 
## 19 IgG   ZM96.g…   6.5     0.097      0.0405 0.0820  0.062 0.0720 0.043 
## 20 IgG   ZM96.g…  12       0.280      0.350  0.143   0.523 0.498  0.341 
## # ... with 2 more variables: Phylum <dbl>, Species <dbl>

I’d like to combine the above into one table, when it isn’t 7:45. Probably has soemething to do with merging columns or something. Or maybe I just want to plot it as a figure.

mirLevels %>%
gather(metric, P, bcxJSD:medJSD) -> mirDat
mirDat %>% dplyr::select(type:month, bcxJSD = bcxJSDOmni, medJSD = medJSDOmni) %>%
group_by(type, antigen, month) %>%
summarize(bcxJSD = mean(bcxJSD), medJSD = mean(medJSD)) %>%
gather(metric, P, bcxJSD, medJSD) -> mirOmni
mirOmni
## # A tibble: 40 x 5
## # Groups:   type, antigen [8]
##    type  antigen       month metric     P
##    <chr> <chr>         <dbl> <chr>  <dbl>
##  1 CD4+  ANY.ENV.PTEG    6.5 bcxJSD 0.762
##  2 CD4+  ANY.ENV.PTEG   12   bcxJSD 0.054
##  3 IgA   gp41            0   bcxJSD 0.995
##  4 IgA   gp41            6.5 bcxJSD 0.259
##  5 IgA   gp41           12   bcxJSD 0.525
##  6 IgA   p24             0   bcxJSD 0.886
##  7 IgA   p24             6.5 bcxJSD 0.982
##  8 IgA   p24            12   bcxJSD 0.444
##  9 IgG   Con.6.gp120.B   6.5 bcxJSD 0.388
## 10 IgG   Con.6.gp120.B  12   bcxJSD 0.003
## # ... with 30 more rows
NTaxaAtLevel %>% bind_rows(specRow[,1:2]) %>% unite(nLev, taxLevels, ntaxa, remove = FALSE) -> NTaxaAtLevel2
NTaxaAtLevel2
## # A tibble: 6 x 3
##   nLev        taxLevels ntaxa
##   <chr>       <chr>     <int>
## 1 Phylum_5    Phylum        5
## 2 Class_12    Class        12
## 3 Order_17    Order        17
## 4 Family_39   Family       39
## 5 Genus_104   Genus       104
## 6 Species_651 Species     651
bind_rows(
    permKernTable %>% mutate(metric = 'med'),
    permKernTableGaus %>% mutate(metric = 'bcx')
    ) %>%
dplyr::select(type, antigen, month, metric, mir.P) %>%
pass -> WufPData
fixant <- function(df){
    df %>%
    mutate(antigen = stringr::str_replace_all(antigen, "\\.", " ")) %>%
    mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
    #mutate(metric =  stringr::str_replace_all(metric, "bcx", "")) %>%
    pass
}

fixstuff <- function(df){
    df %>%
    fixant %>%
    mutate(metric =  stringr::str_replace_all(metric, "JSD", "")) %>%
    pass
}
mirDat %>% fixant %>% head
## # A tibble: 6 x 9
##   type  antigen month taxLevels ntaxa bcxJSDOmni medJSDOmni metric      P
##   <chr> <chr>   <dbl> <fct>     <int>      <dbl>      <dbl> <chr>   <dbl>
## 1 IgA   gp41      6.5 Phylum        5      0.259      0.362 bcxJSD 0.0970
## 2 IgA   gp41      6.5 Class        12      0.259      0.362 bcxJSD 0.316 
## 3 IgA   gp41      6.5 Order        17      0.259      0.362 bcxJSD 0.395 
## 4 IgA   gp41      6.5 Family       39      0.259      0.362 bcxJSD 0.546 
## 5 IgA   gp41      6.5 Genus       104      0.259      0.362 bcxJSD 0.401 
## 6 IgA   gp41      6.5 Species     651      0.259      0.362 bcxJSD 0.731
mirDat %>% 
#mutate(antigen = stringr::str_replace_all(antigen, "\\.", " ")) %>% 
fixstuff %>%
ggplot(aes(x = ntaxa, y = P, col = factor(month), fill = factor(month))) +
geom_point(pch = 21) +
facet_grid(type + antigen ~ metric, labeller = labeller(antigen = label_wrap_gen(width = 10))) + 
scale_x_log10(breaks = c(3, NTaxaAtLevel2$ntaxa, 1000), labels = c("omni", NTaxaAtLevel2$nLev, "wunifrac")) +
scale_y_log10(breaks = c(0.002, 0.01, 0.05, 0.2, 1)) + 
geom_hline(yintercept=0.05, col = 'blue', alpha = 0.5) + geom_hline(yintercept=0.01, col = 'red', alpha = 0.5) +
#geom_hline(data = mirOmni, aes(yintercept = P, col = factor(month))) +
#annotation_logticks(sides = 'bl') +
#geom_rug(data = mirOmni, aes(y = P, col = factor(month)), inherit.aes = F) +
geom_point(data = mirOmni %>% ungroup %>% fixstuff,
           aes(x = 3, y = P, col = factor(month), fill = factor(month)), inherit.aes = F, pch = 22, size = 2) +
geom_point(data = WufPData %>% ungroup %>% fixant,
           aes(x = 1000, y = mir.P, col = factor(month), fill = factor(month)), inherit.aes = F, pch = 24, size = 2) +

scale_colour_manual(values=cbPalette) + 
scale_fill_manual(values=alpha(cbPalette, 0.5)) + 
labs(col = "month", fill = "month") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
     strip.text.y = (element_text(angle = 90))) -> pjsd0

options(repr.plot.width=8, repr.plot.height= 6)
pjsd0

options(par0)
# I'd like to add weighted unifrac as a tick mark on the right.
NTaxaAtLevel2
## # A tibble: 6 x 3
##   nLev        taxLevels ntaxa
##   <chr>       <chr>     <int>
## 1 Phylum_5    Phylum        5
## 2 Class_12    Class        12
## 3 Order_17    Order        17
## 4 Family_39   Family       39
## 5 Genus_104   Genus       104
## 6 Species_651 Species     651
# New combined data frame that has omnibus, regular, and wunifrac all in one
bind_rows(
     mirDat %>% mutate(metric =  stringr::str_replace_all(metric, "JSD", "")) %>%
    mutate(test = "JSD"),
     mirOmni %>% ungroup %>% mutate(metric =  stringr::str_replace_all(metric, "JSD", "")) %>%
    mutate(taxLevels = "Omnibus") %>% mutate(test = "Omnibus"),
     WufPData %>% ungroup %>% dplyr::rename(P = mir.P) %>%
    mutate(taxLevels = "WUnifrac") %>% mutate(test = "WUnifrac")
) %>% 
mutate(antigen = factor(antigen, levels = c(ants2, ants1, "ANY.ENV.PTEG"))) %>%
mutate(type = factor(type, levels = c("IgA", "IgG", "CD4+"))) %>%
mutate(taxLevels = factor(taxLevels, levels = c("Omnibus", NTaxaAtLevel2$taxLevels, "WUnifrac"))) %>%
dplyr::select(-c(bcxJSDOmni:medJSDOmni))%>%
unite(nLev, taxLevels, ntaxa, remove = FALSE) %>%
mutate(nLev = stringr::str_replace(nLev, "_NA", "")) %>%
#mutate(antigen = stringr::str_replace_all(antigen, "\\.", " ")) %>%
#mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
 mutate(antigen = factor(antigen, labels = stringr::str_replace_all(levels(antigen), "\\.", " "))) %>%
 mutate(antigen = factor(antigen, labels = stringr::str_replace_all(levels(antigen), "_", " "))) %>%


# mutate(antigen = factor(antigen, labels = (levels(antigen)))) %>%
# mutate(antigen = factor(antigen, labels = (levels(antigen)))) %>%

mutate(test = factor(test, levels = c('Omnibus', 'JSD', 'WUnifrac'))) %>%
pass -> mirDat2
## Warning in bind_rows_(x, .id): binding factor and character vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
mirDat2 %>% filter(type == 'CD4+')
## # A tibble: 32 x 9
##    type  antigen      month nLev        taxLevels ntaxa metric     P test 
##    <fct> <fct>        <dbl> <chr>       <fct>     <int> <chr>  <dbl> <fct>
##  1 CD4+  ANY ENV PTEG   6.5 Phylum_5    Phylum        5 bcx    0.86  JSD  
##  2 CD4+  ANY ENV PTEG   6.5 Class_12    Class        12 bcx    0.497 JSD  
##  3 CD4+  ANY ENV PTEG   6.5 Order_17    Order        17 bcx    0.396 JSD  
##  4 CD4+  ANY ENV PTEG   6.5 Family_39   Family       39 bcx    0.512 JSD  
##  5 CD4+  ANY ENV PTEG   6.5 Genus_104   Genus       104 bcx    0.704 JSD  
##  6 CD4+  ANY ENV PTEG   6.5 Species_651 Species     651 bcx    0.919 JSD  
##  7 CD4+  ANY ENV PTEG  12   Phylum_5    Phylum        5 bcx    0.908 JSD  
##  8 CD4+  ANY ENV PTEG  12   Class_12    Class        12 bcx    0.014 JSD  
##  9 CD4+  ANY ENV PTEG  12   Order_17    Order        17 bcx    0.057 JSD  
## 10 CD4+  ANY ENV PTEG  12   Family_39   Family       39 bcx    0.146 JSD  
## # ... with 22 more rows
mirDat2 %>% head
## # A tibble: 6 x 9
##   type  antigen month nLev        taxLevels ntaxa metric      P test 
##   <fct> <fct>   <dbl> <chr>       <fct>     <int> <chr>   <dbl> <fct>
## 1 IgA   gp41      6.5 Phylum_5    Phylum        5 bcx    0.0970 JSD  
## 2 IgA   gp41      6.5 Class_12    Class        12 bcx    0.316  JSD  
## 3 IgA   gp41      6.5 Order_17    Order        17 bcx    0.395  JSD  
## 4 IgA   gp41      6.5 Family_39   Family       39 bcx    0.546  JSD  
## 5 IgA   gp41      6.5 Genus_104   Genus       104 bcx    0.401  JSD  
## 6 IgA   gp41      6.5 Species_651 Species     651 bcx    0.731  JSD
mirDat2 %>%
filter(type == "IgG") %>%
ggplot(aes(x = taxLevels, y = P, col = factor(month), fill = factor(month), shape = test)) +
geom_point(size = 2) +
facet_grid(type + antigen ~ metric, labeller = labeller(antigen = label_wrap_gen(width = 10))) + 
#scale_x_log10(breaks = c(3, NTaxaAtLevel2$ntaxa, 1000), labels = c("omni", NTaxaAtLevel2$nLev, "wunifrac")) +
scale_y_log10(breaks = c(0.002, 0.01, 0.05, 0.2, 1)) + 
geom_hline(yintercept=0.05, col = 'blue', alpha = 0.5) + geom_hline(yintercept=0.01, col = 'red', alpha = 0.5) +

scale_shape_manual(values = c(22, 21, 24)) +
scale_colour_manual(values=cbPalette) + 
scale_fill_manual(values=alpha(cbPalette, 0.5)) + 
labs(col = "month", fill = "month", x = "Taxonomic Level") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
     strip.text.y = (element_text(angle = 90))) -> pjsd


options(repr.plot.width=6, repr.plot.height= 8)
pjsd

ggsave('figures/KernelPVsLevel.png', width = 6, height = 8)
options(par0)

X-axis is now spaced evenly

Table SX. P values of kernel regression tests. Circles indicate jensen shannon values at different taxonomic resolutions. Squares are the omnibus p-value for that cohort of tests. Triangles indicate kernel regression p-values for the corresponding weighted unifrac test.

The blue and red lines indicate p values of 5% and 1% respectively.

Observations: The weighted unifrac test is sensitive. In cases where only one taxonic level hits, weighted unifrac often also falls at some statistically significant value. The omnibus p value is often higher than the weighted unifrac one. Weighted unifrac seems like a good test for identifying patterns at any level that relate to an outcome. The jensen shannon informs us about which level the pattern is observed.

Local Tests

Family, genera and species vs wuf1

I might even be able to drill down to every level.

model_each_species <- function(ps, f, pthresh = 1, q = FALSE){
    # Start with the otu table
ps %>%
# reshape it so we have clr values for every taxon-sample pair
otu_table %>% as.data.frame %>% rownames_to_column("Sample") %>% gather(Taxon, clr, -Sample) %>%
    # bind that to the sample data
    # doing this here seems remarkably inefficient, but its not creating a bottleneck so I'll leave it.
left_join(
    ps %>%
    # the sample data need to have MDS1 and MDS2 appended to them
    phylo_join(
    psN2.pcoa %>% scores(display = "sites") %>% # hardcoded psN2.pcoa
        as.data.frame %>% 
        rownames_to_column %>% 
        dplyr::select('rowname', 'MDS1', 'MDS2'),
    by = 'rowname'
) %>%
    # back to binding to sample data
    sample_data %>% as('data.frame') %>% rownames_to_column("Sample"),
     by = 'Sample') %>%

group_by(Taxon) %>%  # group and nest for model run
nest %>%
mutate(Mod = map(data, f)) %>% # apply model over each species
mutate(Glance = map(Mod, glance), Tidy = map(Mod, tidy)) %>% # extract relevant data from model
# view model
dplyr::select(Taxon, Tidy) %>% unnest %>%
mutate(term = gsub('[\\( \\)]','', term)) %>% # remove parentheses from "(Intercept)"
gather(meas, val, estimate:p.value) %>% 
unite(meas, term, meas) %>% spread(meas, val) %>% arrange(clr_estimate) %>% 
dplyr::select(Taxon, Intercept_estimate, clr_estimate, clr_std.error, clr_p.value) %>%
    # add q value
    {if(q) mutate(., clr_q.value = p2q(clr_p.value)) else .} %>%
    
 filter(clr_p.value < pthresh) %>%

     #Join taxonomy information
     left_join(
     as.data.frame(ps@tax_table@.Data) %>% as.tibble %>% dplyr::select(Kingdom:Genus, Species, tag) %>%
         mutate(tag = as.character(tag)), # mutate so tag is and taxon are both character class
     by = c("Taxon" = "tag")) %>%
pass
 }
model_each_species_for_antigen <- function(antigen, ps = psN2){
    ps %>%
    model_each_species(function(df){glm(medcode2(get(antigen)) ~ clr, data = df, family = 'binomial')}, q = TRUE, pthresh = 1)
}
ColsToRun <- c('IgG_Con.6.gp120.B_Month_6.5', 'IgG_Con.6.gp120.B_Month_12', 'IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_gp70_B.CaseA_V1_V2_Month_12', 'IgG_ZM96.gp140_Month_6.5', 'MDS1', 'isMale' ) 
model_each_species_case <- function(ps){
    
    ps %>% model_each_species(function(df){glm(MDS1 ~ clr, data = df, family = 'gaussian')}, q = TRUE, pthresh = 1) %>%
arrange(clr_estimate) %>%
mutate(Taxon = factor(Taxon, levels = Taxon[order(clr_estimate)])) %>%
    mutate(test = 'gaussian', antigen = 'MDS1') %>%
pass -> loc_mds1Glms
    
        ps %>% model_each_species(function(df){glm(log10(IgG_Con.6.gp120.B_Month_12 + 100) ~ clr, data = df, family = 'gaussian')}, q = TRUE, pthresh = 1) %>%
arrange(clr_estimate) %>%
 mutate(Taxon = factor(Taxon, levels = levels(loc_mds1Glms$Taxon))) %>%
    mutate(test = 'gaussian', antigen = 'Con.6.gp120.B_Month_12') %>%
 pass -> loc_gp120Glms
    
      tibble(antigen = ColsToRun) %>% mutate(model = map(antigen, ~model_each_species_for_antigen(., ps = ps))) %>%
  unnest %>% mutate(Taxon = factor(Taxon, levels = levels(loc_mds1Glms$Taxon))) %>%
        mutate(test = 'binomial') %>%
    pass-> loc_logitCoefs
    

    #list(loc_mds1Glms, loc_gp120Glms, loc_logitCoefs)
     bind_rows(loc_mds1Glms, loc_gp120Glms, loc_logitCoefs) %>% dplyr::select(test, antigen, everything())
    
}
#psDf %>% mutate(ps2 = map(ps, ~swap.phyloseq.taxnames(tag_phyloseq(remove_tag_phyloseq(.))))) -> test
#psDf[[1,"clr"]] %>% tax_table
psDf[[1]]
## [1] "Phylum"  "Class"   "Order"   "Family"  "Genus"   "Species"
psDf %>% print
## # A tibble: 6 x 9
##   taxLevels ntaxa psCount   ps      jsd    jsdMat   kjsd   psNoZero clr   
##   <chr>     <int> <list>    <list>  <list> <list>   <list> <list>   <list>
## 1 Phylum        5 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 2 Class        12 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 3 Order        17 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 4 Family       39 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 5 Genus       104 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 6 Species     651 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
ptm = proc.time()
psDf %>% dplyr::select(taxLevels, ntaxa, clr) %>% mutate(localmod = map(clr, model_each_species_case)) ->psDfLoc
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
proc.time() - ptm
##    user  system elapsed 
##  41.692   0.032  41.727
print(psDfLoc)
## # A tibble: 6 x 4
##   taxLevels ntaxa clr            localmod             
##   <chr>     <int> <list>         <list>               
## 1 Phylum        5 <S4: phyloseq> <tibble [50 × 15]>   
## 2 Class        12 <S4: phyloseq> <tibble [120 × 15]>  
## 3 Order        17 <S4: phyloseq> <tibble [170 × 15]>  
## 4 Family       39 <S4: phyloseq> <tibble [390 × 15]>  
## 5 Genus       104 <S4: phyloseq> <tibble [1,040 × 15]>
## 6 Species     651 <S4: phyloseq> <tibble [6,510 × 15]>
psDfLoc %>% dplyr::select(-clr) %>% unnest(localmod) -> tmp
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
psDfLoc$taxLevels
## [1] "Phylum"  "Class"   "Order"   "Family"  "Genus"   "Species"
tmp %>% mutate(taxLevels = factor(taxLevels, levels = psDfLoc$taxLevels)) -> LocalTests
LocalTests %>% 
filter(antigen != "MDS1") %>%
write_csv("tables/AllLocalTests.csv")

I want to show the local tests vs antibodies.

LocalTests %>% head
## # A tibble: 6 x 17
##   taxLevels ntaxa test  antigen Taxon Intercept_estim… clr_estimate
##   <fct>     <int> <chr> <chr>   <chr>            <dbl>        <dbl>
## 1 Phylum        5 gaus… MDS1    Bact…            1.94        -0.762
## 2 Phylum        5 gaus… MDS1    Porp…           -1.03        -0.514
## 3 Phylum        5 gaus… MDS1    Acti…           -0.201       -0.104
## 4 Phylum        5 gaus… MDS1    Camp…            0.439        0.242
## 5 Phylum        5 gaus… MDS1    Bact…           -1.42         0.445
## 6 Phylum        5 gaus… Con.6.… Bact…            3.76        -0.285
## # ... with 10 more variables: clr_std.error <dbl>, clr_p.value <dbl>,
## #   clr_q.value <dbl>, Kingdom <fct>, Phylum <chr>, Class <chr>,
## #   Order <chr>, Family <chr>, Genus <chr>, Species <chr>
options(repr.plot.width=6, repr.plot.height=8)
LocalTests %>% 
filter(antigen != "MDS1") %>%
#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " IgG ", "")) %>%
ggplot(aes(x = factor(taxLevels), y = clr_q.value)) + geom_point(size = 0.1) +
geom_hline(yintercept = 0.2, color = 'grey') +
 geom_hline(yintercept = 0.05, color = 'blue') + geom_hline(yintercept = 0.01, color = 'green') +
facet_wrap(~antigen + test) + scale_y_log10() + 
  labs(y = "q-value", x = "Taxonomic Level") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

ggsave('figures/LocalQEveryLevel.png', width = 6, height = 8, units = "in")

BOOKMARK

options(repr.plot.width=6, repr.plot.height=8)
LocalTests %>% 
filter(antigen != "MDS1") %>%
#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " IgG ", "")) %>%
ggplot(aes(x = factor(taxLevels), y = clr_p.value)) + geom_point(size = 0.1) +
geom_hline(yintercept = 0.2, color = 'grey') +
 geom_hline(yintercept = 0.05, color = 'blue') + geom_hline(yintercept = 0.01, color = 'green') +
facet_wrap(~antigen + test) +# scale_y_log10() +
  labs(y = expression(paste(italic("p"),"-value")), x = "Taxonomic Level") + 
  
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

ggsave('figures/LocalPEveryLevel.png', width = 6, height = 8, units = "in")
options(repr.plot.width=6, repr.plot.height=8)
LocalTests %>% 
filter(antigen != "MDS1") %>%
#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " IgG ", "")) %>%
ggplot(aes(x = factor(taxLevels), y = clr_p.value)) + geom_point(size = 0.1) +
geom_hline(yintercept = 0.2, color = 'grey') +
 geom_hline(yintercept = 0.05, color = 'blue') + geom_hline(yintercept = 0.01, color = 'green') +
facet_wrap(~antigen + test) + scale_y_log10() +
labs(y = expression(paste(italic("p"),"-value")), x = "Taxonomic Level") + 
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

ggsave('figures/LocalPEveryLevel_LogScale.png', width = 6, height = 8, units = "in")
order_taxa_by_mds1 <- function(df){
    # this has to be a model_each_species type of data frame
    df %>% filter(antigen == 'MDS1' & test == 'gaussian') %>%
    mutate(TaxonF = factor(Taxon, levels = Taxon[order(clr_estimate)])) -> mds1df
    df %>% mutate(TaxonF = factor(Taxon, levels = levels(mds1df$TaxonF)))
}

To my annoyance, everything is labeled with IgG except gp120_12

LocalTests %>%
filter(taxLevels == 'Family') %>%
order_taxa_by_mds1 %>%

filter(
    (antigen %in% c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5'))|
    (antigen == 'Con.6.gp120.B_Month_12' & test == 'gaussian')
      ) %>%
mutate(antigen = factor(antigen,
                        levels = c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5',
                                   'IgG_Con.6.gp120.B_Month_6.5', 'Con.6.gp120.B_Month_12'))) %>%
filter(clr_p.value < 0.05 & clr_q.value < 0.2) %>%
dplyr::select(antigen:clr_estimate) %>%
dplyr::select(-Intercept_estimate) %>%
mutate(cordir = sign(clr_estimate)) %>%
pass
## # A tibble: 11 x 4
##    antigen                Taxon                clr_estimate cordir
##    <fct>                  <chr>                       <dbl>  <dbl>
##  1 Con.6.gp120.B_Month_12 Bacteroidetes.4            -0.307     -1
##  2 Con.6.gp120.B_Month_12 Firmicutes.4               -0.295     -1
##  3 Con.6.gp120.B_Month_12 Firmicutes.2               -0.157     -1
##  4 Con.6.gp120.B_Month_12 Clostridia                 -0.143     -1
##  5 Con.6.gp120.B_Month_12 Bacteroides                 0.204      1
##  6 Con.6.gp120.B_Month_12 Firmicutes.5                0.207      1
##  7 IgG_gp41_Month_0       Porphyromonadaceae.1       -2.12      -1
##  8 IgG_gp41_Month_0       Porphyromonadaceae         -1.69      -1
##  9 IgG_gp41_Month_0       Bacteroidetes.1            -1.20      -1
## 10 IgG_gp41_Month_0       Clostridia                  1.31       1
## 11 IgG_gp41_Month_0       Anaerococcus                1.53       1
LocalTests %>% pull(antigen) %>% unique
## [1] "MDS1"                            "Con.6.gp120.B_Month_12"         
## [3] "IgG_Con.6.gp120.B_Month_6.5"     "IgG_Con.6.gp120.B_Month_12"     
## [5] "IgG_gp41_Month_0"                "IgG_gp41_Month_6.5"             
## [7] "IgG_gp70_B.CaseA_V1_V2_Month_12" "IgG_ZM96.gp140_Month_6.5"       
## [9] "isMale"

labs(y = expression(paste(italic(“p”),“-value”)), x = “Taxonomic Level”) +

# Family Hits
LocalTests %>%
filter(taxLevels == 'Family') %>%
order_taxa_by_mds1 %>%

filter(
    (antigen %in% c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5', 
                    'IgG_ZM96.gp140_Month_6.5','IgG_gp70_B.CaseA_V1_V2_Month_12'))|
    (antigen == 'Con.6.gp120.B_Month_12' & test == 'gaussian')
      ) %>%
mutate(antigen = factor(antigen, levels = c(
    'IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5', 'Con.6.gp120.B_Month_12',
    'IgG_ZM96.gp140_Month_6.5','IgG_gp70_B.CaseA_V1_V2_Month_12'
))) %>%

pass -> tmp

#tmp$antigen %>% unique

tmp %>% filter(clr_p.value < 0.05 & clr_q.value < 0.2) %>%
pull(Taxon) %>% unique -> useFamily

tmp %>% filter(Taxon %in% useFamily) %>%

#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, "IgG ", "")) %>%

ggplot(aes(x = TaxonF, y = clr_estimate,
           color = (clr_p.value < 0.05), shape =(clr_q.value < 0.2))) +
geom_point(size = 3) + 
geom_errorbar(aes(ymin = clr_estimate - 2*clr_std.error, ymax = clr_estimate + 2*clr_std.error)) + 
  labs(y = "Coefficient",
       x = "Family Level Taxon",
       col = expression(paste(italic("p"), "-value")),
       shape = "q-value") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + geom_hline(yintercept = 0) +
facet_wrap(~antigen, ncol = 1, scales = 'free_y')

# Show the censored ones accross - so this would be everything with at least one hit
# but also show what they are in all cases.

ggsave('figures/anyFamilyIgg.png', width = 6, height = 8)

I think its worth digging into clostridia and Prophyromonidaceae with stacked bars

Proportionality heatmap

Family level

Lets come back to this after we’ve done the local tests. Since we need them to color code the axes.

psDf %>% print
## # A tibble: 6 x 9
##   taxLevels ntaxa psCount   ps      jsd    jsdMat   kjsd   psNoZero clr   
##   <chr>     <int> <list>    <list>  <list> <list>   <list> <list>   <list>
## 1 Phylum        5 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 2 Class        12 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 3 Order        17 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 4 Family       39 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 5 Genus       104 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 6 Species     651 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
psDf %>% filter(taxLevels == 'Family') %>% dplyr::select(ps) %>% pull %>%.[[1]]
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 39 taxa and 21 samples ]
## sample_data() Sample Data:       [ 21 samples by 35 sample variables ]
## tax_table()   Taxonomy Table:    [ 39 taxa by 12 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 39 tips and 38 internal nodes ]
print(psDf)
## # A tibble: 6 x 9
##   taxLevels ntaxa psCount   ps      jsd    jsdMat   kjsd   psNoZero clr   
##   <chr>     <int> <list>    <list>  <list> <list>   <list> <list>   <list>
## 1 Phylum        5 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 2 Class        12 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 3 Order        17 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 4 Family       39 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 5 Genus       104 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 6 Species     651 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
#nFamilyTaxa <- NTaxaAtLevel %>% filter(taxLevels == 'Family') %>% pull(ntaxa)

psDf %>% filter(taxLevels == 'Family') %>% dplyr::select(psNoZero) %>% pull %>%.[[1]] %>%
otu_table %>% as.data.frame %>%
pass -> myRel

ptm = proc.time()
phiBoot <- boot(data = myRel, statistic = boot_phi, R = 1000)
proc.time() - ptm
##    user  system elapsed 
##   6.513   0.000   6.604
ptm = proc.time()
tidyCI <- unwarn(
    tidy(phiBoot,conf.int=TRUE,conf.method="bca")
    )
proc.time() - ptm
##    user  system elapsed 
##  13.825   0.028  13.853
myRel %>% make_proportionality_matrix %>% 
         as.data.frame %>%
         rownames_to_column("TaxonX") %>% gather(TaxonY, phi, -TaxonX) %>%
    filter(TaxonX != TaxonY) %>% data.frame(tidyCI) -> namedTidyCI
head(LocalTests)
## # A tibble: 6 x 17
##   taxLevels ntaxa test  antigen Taxon Intercept_estim… clr_estimate
##   <fct>     <int> <chr> <chr>   <chr>            <dbl>        <dbl>
## 1 Phylum        5 gaus… MDS1    Bact…            1.94        -0.762
## 2 Phylum        5 gaus… MDS1    Porp…           -1.03        -0.514
## 3 Phylum        5 gaus… MDS1    Acti…           -0.201       -0.104
## 4 Phylum        5 gaus… MDS1    Camp…            0.439        0.242
## 5 Phylum        5 gaus… MDS1    Bact…           -1.42         0.445
## 6 Phylum        5 gaus… Con.6.… Bact…            3.76        -0.285
## # ... with 10 more variables: clr_std.error <dbl>, clr_p.value <dbl>,
## #   clr_q.value <dbl>, Kingdom <fct>, Phylum <chr>, Class <chr>,
## #   Order <chr>, Family <chr>, Genus <chr>, Species <chr>
LocalTests %>% filter(test == 'gaussian' &
                        antigen == 'MDS1' &
                         clr_p.value <0.05 &
                        clr_q.value < 0.2&
                        taxLevels == "Family") -> tmp
tmp %>% pull(Taxon) -> MDS1Fam
tmp %>% filter(clr_estimate < 0) %>% pull(Taxon) -> lowMDS1Fam
tmp %>% filter(clr_estimate >= 0) %>% pull(Taxon) -> highMDS1Fam

https://stackoverflow.com/questions/48531987/incorporate-more-information-about-variables-on-axes-into-a-heatmap-in-ggplot/48532983#48532983

I’d like to do this, but for gp41 baseline and gp120 as well.

useFamily
##  [1] "Bacteroidetes.4"      "Firmicutes.4"         "Firmicutes.2"        
##  [4] "Clostridia"           "Bacteroides"          "Firmicutes.5"        
##  [7] "Porphyromonadaceae.1" "Porphyromonadaceae"   "Bacteroidetes.1"     
## [10] "Anaerococcus"
LocalTests %>% head
## # A tibble: 6 x 17
##   taxLevels ntaxa test  antigen Taxon Intercept_estim… clr_estimate
##   <fct>     <int> <chr> <chr>   <chr>            <dbl>        <dbl>
## 1 Phylum        5 gaus… MDS1    Bact…            1.94        -0.762
## 2 Phylum        5 gaus… MDS1    Porp…           -1.03        -0.514
## 3 Phylum        5 gaus… MDS1    Acti…           -0.201       -0.104
## 4 Phylum        5 gaus… MDS1    Camp…            0.439        0.242
## 5 Phylum        5 gaus… MDS1    Bact…           -1.42         0.445
## 6 Phylum        5 gaus… Con.6.… Bact…            3.76        -0.285
## # ... with 10 more variables: clr_std.error <dbl>, clr_p.value <dbl>,
## #   clr_q.value <dbl>, Kingdom <fct>, Phylum <chr>, Class <chr>,
## #   Order <chr>, Family <chr>, Genus <chr>, Species <chr>
reshape2::melt
## function (data, ..., na.rm = FALSE, value.name = "value") 
## {
##     UseMethod("melt", data)
## }
## <bytecode: 0x55fe9434db80>
## <environment: namespace:reshape2>
targStat <- "phi"
namedTidyCI %>% dplyr::select(TaxonX, TaxonY, targStat) %>% spread(key = TaxonY, value = targStat) %>%
remove_rownames %>% column_to_rownames("TaxonX") %>%
as.dist %>% as.matrix -> phidata

phi_dd <- as.dist(phidata)
phi_hc <- hclust(phi_dd)

phidata %>%
#.[phi_hc$order, phi_hc$order] %>% # this way also worked just fine
reshape2::melt() %>%
mutate(Var1 = factor(Var1, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
mutate(Var2 = factor(Var2, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
pass-> tmp

p_phi_1 <- ggplot(tmp, aes(Var1, Var2, fill =(value))) + 
geom_tile() +
  scale_fill_gradient(high = "grey90", low = "red", 
    space = "Lab", 
    name="phi",
                    limits = c(NA, 3), na.value = "white") +
#  theme_minimal()+ # minimal theme
 theme(axis.text.x = element_text(
     angle = 90, vjust = 1, size = 7, hjust = 1,
     face = ifelse(levels(tmp$Var1) %in% useFamily, "bold", "plain"),
     colour = ifelse(levels(tmp$Var1) %in% useFamily, "black", "grey30")
                                 ),
       axis.text.y = element_text(
           size = 7,
           face = ifelse(levels(tmp$Var1) %in% MDS1Fam, "bold", "plain"),
           colour = ifelse(levels(tmp$Var1) %in% lowMDS1Fam, "red",
                          ifelse(levels(tmp$Var1) %in% highMDS1Fam, "blue", "grey30"))
       ))+
 coord_fixed() +
labs(x = "Family Any Igg",y = "Family MDS1 (red-low, blue-high)" ) +
# rectangles around the three clusters, positioned by eye
  geom_rect(aes(xmin = 0 + 0.5, xmax = 10 - 0.5, ymin = 0 + 0.5, ymax = 10 - 0.5),
               fill = "transparent", color = "gray20", size = 1.5) +
  geom_rect(aes(xmin = 13 + 0.5, xmax = 24 - 0.5, ymin = 13 + 0.5, ymax = 24 - 0.5),
               fill = "transparent", color = "gray20", size = 1.5) +
  geom_rect(aes(xmin = 23 + 0.5, xmax = 37 - 0.5, ymin = 23 + 0.5, ymax = 37 - 0.5),
               fill = "transparent", color = "gray20", size = 1.5)


p_phi_1

# ggsave("figures/phi_vs_mds1_and_igg.png", p_phi_1, width = 6, height = 6)
LocalTests %>%
filter(taxLevels == 'Family') %>%
order_taxa_by_mds1 %>%

filter(
    (antigen %in% c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5',
                   'IgG_ZM96.gp140_Month_6.5','IgG_gp70_B.CaseA_V1_V2_Month_12'))|
    (antigen == 'Con.6.gp120.B_Month_12' & test == 'gaussian')
      ) %>%
mutate(antigen = factor(antigen, levels = c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5', 'Con.6.gp120.B_Month_12',
                                           'IgG_ZM96.gp140_Month_6.5','IgG_gp70_B.CaseA_V1_V2_Month_12'))) %>%
pass -> tmp

tmp %>% dplyr::select(antigen, Taxon, clr_estimate, clr_p.value, clr_q.value) %>%
mutate(clr_sign = sign(clr_estimate)) %>%
mutate(isHit = ifelse(clr_p.value < 0.05 & clr_q.value < 0.2, 1, 0)) %>%
mutate(Taxon = factor(Taxon, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
pass -> chorddata
targStat <- "conf.low"
namedTidyCI %>% dplyr::select(TaxonX, TaxonY, targStat) %>% spread(key = TaxonY, value = targStat) %>%
remove_rownames %>% column_to_rownames("TaxonX") %>%
as.dist %>% as.matrix -> phidata

phi_dd <- as.dist(phidata)
phi_hc <- hclust(phi_dd)

phidata %>%
#.[phi_hc$order, phi_hc$order] %>% # this way also worked just fine
reshape2::melt() %>%
mutate(Var1 = factor(Var1, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
mutate(Var2 = factor(Var2, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
pass-> tmp

ggplot(tmp, aes(Var1, Var2, fill =(value))) + 
geom_tile() +
  scale_fill_gradient(high = "grey90", low = "red", 
    space = "Lab", 
    name="phi",
                    limits = c(NA, 3), na.value = "white") +
#  theme_minimal()+ # minimal theme
 theme(axis.text.x = element_text(
     angle = 90, vjust = 1, size = 7, hjust = 1,
     face = ifelse(levels(tmp$Var1) %in% useFamily, "bold", "plain"),
     colour = ifelse(levels(tmp$Var1) %in% useFamily, "black", "grey30")
                                 ),
       axis.text.y = element_text(
           size = 7,
           face = ifelse(levels(tmp$Var1) %in% MDS1Fam, "bold", "plain"),
           colour = ifelse(levels(tmp$Var1) %in% lowMDS1Fam, "red",
                          ifelse(levels(tmp$Var1) %in% highMDS1Fam, "blue", "grey30"))
       ))+
 coord_fixed() +
labs(x = "Family Any Igg",y = "Family MDS1 (red-low, blue-high)" )-> p_phi_low
p_phi_low

# ggsave("figures/phi_vs_mds1_and_igg.png", p_phi_1, width = 6, height = 6)
targStat <- "conf.high"
namedTidyCI %>% dplyr::select(TaxonX, TaxonY, targStat) %>% spread(key = TaxonY, value = targStat) %>%
remove_rownames %>% column_to_rownames("TaxonX") %>%
as.dist %>% as.matrix -> phidata

phi_dd <- as.dist(phidata)
phi_hc <- hclust(phi_dd)

phidata %>%
#.[phi_hc$order, phi_hc$order] %>% # this way also worked just fine
reshape2::melt() %>%
mutate(Var1 = factor(Var1, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
mutate(Var2 = factor(Var2, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
pass-> tmp

ggplot(tmp, aes(Var1, Var2, fill =(value))) + 
geom_tile() +
  scale_fill_gradient(high = "grey90", low = "red", 
    space = "Lab", 
    name="phi",
                    limits = c(NA, 3), na.value = "white") +
#  theme_minimal()+ # minimal theme
 theme(axis.text.x = element_text(
     angle = 90, vjust = 1, size = 7, hjust = 1,
     face = ifelse(levels(tmp$Var1) %in% useFamily, "bold", "plain"),
     colour = ifelse(levels(tmp$Var1) %in% useFamily, "black", "grey30")
                                 ),
       axis.text.y = element_text(
           size = 7,
           face = ifelse(levels(tmp$Var1) %in% MDS1Fam, "bold", "plain"),
           colour = ifelse(levels(tmp$Var1) %in% lowMDS1Fam, "red",
                          ifelse(levels(tmp$Var1) %in% highMDS1Fam, "blue", "grey30"))
       ))+
 coord_fixed() +
labs(x = "Family Any Igg",y = "Family MDS1 (red-low, blue-high)" )-> p_phi_high
p_phi_high

# ggsave("figures/phi_vs_mds1_and_igg.png", p_phi_1, width = 6, height = 6)
chorddata %>%
#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, "IgG ", "")) %>%

ggplot(
    aes(x = Taxon, y = antigen, alpha = factor(isHit), color = factor(clr_sign))) +
scale_alpha_discrete(range = c(0, 1)) +
guides(alpha = FALSE) +
theme_minimal() +
     coord_fixed(ratio = 2) +
scale_colour_manual(values = c("red", "blue")) +
 theme(axis.text.x = element_text(
     angle = 90, vjust = 0.5, size = 7, hjust = 1,
     face = ifelse(levels(chorddata$Taxon) %in% useFamily, "bold", "plain"),
     colour = ifelse(levels(chorddata$Taxon) %in% useFamily, "black", "grey30")),
     plot.margin = unit(c(0,3,1,3), "cm")
     ) +
#guides(col = TRUE) +
guides(color=guide_legend(title="Sign of GLM")) +
labs(x = "Family",y = "Antigen -- Month" ) +
geom_point() -> guitar_chords
## Warning: Using alpha for a discrete variable is not advised.
par <- options()
options(repr.plot.width=10, repr.plot.height= 5)
guitar_chords

options(par)
p_phi_1a <- p_phi_1 + 
theme(axis.text.x = element_blank(),
     axis.title.x = element_blank(),
     plot.margin = unit(c(1, 3, -5.5, 4), "cm"))

par <- options()
options(repr.plot.width=8, repr.plot.height= 8)

p_phi_cord <- cowplot::plot_grid(p_phi_1a, guitar_chords, nrow = 2, align = "v")

p_phi_cord

#phi_legend <- cowplot::get_legend(p_phi_1)
# cowplot::ggdraw(
#     cowplot::plot_grid(
#     cowplot::plot_grid(p_phi_1a, guitar_chords, ncol = 1, align = "v"),
#       cowplot::plot_grid(phi_legend, NULL, ncol = 1),
#       rel_widths = c(10,1)
#         ))

 ggsave('figures/phi_heatmap_withlegend.png', width = 10, height = 10)

options(par)

Stacked bars

# More color-blind friendly colorbalettes
#http://colorbrewer2.org/#type=qualitative&scheme=Paired&n=10
cb10 <- c('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6','#6a3d9a')

cb12 <- c('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6','#6a3d9a','#ffff99','#b15928')

# Less color-blind friendly, but still nice.
#https://sashat.me/2017/01/11/list-of-20-simple-distinct-colors/
trub20 <- c('#e6194b','#3cb44b','#ffe119','#0082c8','#f58231','#911eb4','#46f0f0','#f032e6','#d2f53c','#fabebe','#008080','#e6beff','#aa6e28','#fffac8','#800000','#aaffc3','#808000','#ffd8b1','#000080','#808080','#FFFFFF','#000000')
options(repr.plot.width=8, repr.plot.height= 4)

Bookmark Here. Stuck for strange reasons.

ordered_pub_id_df <- psN2 %>% sample_data %>% dplyr::select(pub_id, rMDS1, newname, MDS1) %>% arrange(rMDS1) %>% mutate(MDS1 = round(MDS1, 2), fig3tick = paste(pub_id, MDS1,sep = "_"))
## Warning in class(x) <- c(subclass, "tbl_df", "tbl", "data.frame"): Setting
## class(x) to multiple strings ("tbl_df", "tbl", ...); result will no longer
## be an S4 object
ordered_pub_id_df
##    pub_id rMDS1      newname  MDS1  fig3tick
## 1      21     1 01_Sample-89 -0.79  21_-0.79
## 2      45     2 02_Sample-76 -0.70   45_-0.7
## 3     282     3 03_Sample-57 -0.69 282_-0.69
## 4     162     4 04_Sample-73 -0.56 162_-0.56
## 5     236     5 05_Sample-82 -0.53 236_-0.53
## 6      40     6 06_Sample-93 -0.51  40_-0.51
## 7      92     7 07_Sample-98 -0.47  92_-0.47
## 8     238     8 08_Sample-67 -0.40  238_-0.4
## 9     176     9 09_Sample-60 -0.24 176_-0.24
## 10    151    10 10_Sample-72 -0.19 151_-0.19
## 11    290    11 11_Sample-86 -0.16 290_-0.16
## 12     19    12 12_Sample-84 -0.04  19_-0.04
## 13      3    13 13_Sample-87  0.02    3_0.02
## 14     49    14 14_Sample-63  0.03   49_0.03
## 15    247    15 15_Sample-91  0.42  247_0.42
## 16    123    16 16_Sample-62  0.55  123_0.55
## 17    208    17 17_Sample-81  0.65  208_0.65
## 18    175    18 18_Sample-92  0.68  175_0.68
## 19    281    19 19_Sample-78  0.71  281_0.71
## 20    288    20 20_Sample-66  0.77  288_0.77
## 21     89    21 21_Sample-75  1.46   89_1.46
fig3tick <- ordered_pub_id_df %>% pull(fig3tick)
p_phy <- plot_bar(psN2, x = 'newname', fill = 'Phylum') + scale_fill_manual(values = cb10)  + xlab("") +
ggtitle("All Phyla")+ theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
     strip.text.y = (element_text(angle = 90))) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_phy

#ggsave('plots/Phyla_by_wuf1.png')
p_firm <-  subset_taxa(psN2, Phylum == 'Firmicutes') %>%
plot_bar( x = 'newname', fill = 'Order') +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
     strip.text.y = (element_text(angle = 90))) +
scale_fill_manual(values = cb10) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_firm

#ggsave('plots/MostFirmicutesAreClostridiales.png')
p_clostridia <-  subset_taxa(psN2, Class == 'Clostridia') %>%
plot_bar( x = 'newname', fill = 'Family') + scale_fill_manual(values = cb10)  + xlab("") +
ggtitle("Families of Order Clostridiales (All Class Clostridia)")+ theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
     strip.text.y = (element_text(angle = 90))) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_clostridia

# p_porph <-  subset_taxa(psN2, Family == 'Porphyromonadaceae') %>%
# plot_bar( x = 'newname', fill = 'Genus') + scale_fill_manual(values = cb10) #+ theme_bw()
# p_porph
# p_bact <- subset_taxa(psN2, Phylum == 'Bacteroidetes') %>% # all class (Bacteroidia), order (Bacteroidales)
# plot_bar(x = 'newname', fill = 'Family') + scale_fill_manual(values = cb10) #+ theme_bw()
# p

# ggsave('figures/Bacteroidetes_Families.png')
p_ClosXI <- subset_taxa(psN2, Family == 'Clostridiales_Incertae_Sedis_XI') %>%
plot_bar( x = 'newname', fill = 'Genus') + scale_fill_manual(values = cb10) + xlab("") +
ggtitle("Genera of Family Clostridiales_Incertae_Sedis_XI")+ theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
     strip.text.y = (element_text(angle = 90))) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_ClosXI

#ggsave('figures/Clostridiales_Incertae_Sedis_XI_Genus.png')
p_Bact <- subset_taxa(psN2, Phylum == 'Bacteroidetes') %>%
plot_bar( x = 'newname', fill = 'Family') + scale_fill_manual(values = cb10) + xlab("") +
ggtitle("Families of Class Bacteroidetes (All Order Bacteroidiales)")+ theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
     strip.text.y = (element_text(angle = 90))) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_Bact

#ggsave('figures/Bacteroides.png')
options(repr.plot.width=14, repr.plot.height= 8)
sb <- cowplot::plot_grid(p_phy, p_Bact,p_clostridia,p_ClosXI,ncol = 2, labels = c("A", "B", "C", "D"))
sb

cowplot::save_plot('figures/stacked_bars.png', sb, base_width = 14, base_height = 8)
cowplot::save_plot('figures/stacked_bars.svg', sb, base_width = 14, base_height = 8)

Exporting OTU tables and Taxa tables at each agglomeration level

# psDf %>%
# mutate(OTU = map(ps, ~data.frame(otu_table(.)))) %>%
# mutate(Tax = map(ps, ~data.frame(tax_table(.)))) %>%
# mutate(OTUCount = map (psCount, ~data.frame(otu_table(.)))) %>%
# pass -> psDf1
psDf %>%
mutate(OTU = map(ps, ~data.frame(otu_table(.)))) %>%
mutate(Tax = map(ps, ~as.data.frame(.@tax_table@.Data))) %>%
mutate(OTUCount = map (psCount, ~data.frame(otu_table(.)))) %>%
pass -> psDf1
psDf1 %>%
.[1:5,] %>%
mutate(Tax = map(Tax, ~dplyr::select(.,-oldname2))) %>%
print
## # A tibble: 5 x 12
##   taxLevels ntaxa psCount ps    jsd   jsdMat kjsd  psNoZero clr   OTU  
##   <chr>     <int> <list>  <lis> <lis> <list> <lis> <list>   <lis> <lis>
## 1 Phylum        5 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## 2 Class        12 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## 3 Order        17 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## 4 Family       39 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## 5 Genus       104 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## # ... with 2 more variables: Tax <list>, OTUCount <list>
# https://stackoverflow.com/questions/50341012/return-the-mapped-object-if-expression-inside-of-purrrpossibly-fails/50341205#50341205
rm_oldname2 <- function(x){
    f = possibly(function() dplyr::select(x, -oldname2), otherwise = x)
        f()
}
psDf1 %>%
#.[1:5,] %>%
mutate(Tax = map(Tax, rm_oldname2)) %>%
pass -> psDf1b
print(psDf1)
## # A tibble: 6 x 12
##   taxLevels ntaxa psCount ps    jsd   jsdMat kjsd  psNoZero clr   OTU  
##   <chr>     <int> <list>  <lis> <lis> <list> <lis> <list>   <lis> <lis>
## 1 Phylum        5 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## 2 Class        12 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## 3 Order        17 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## 4 Family       39 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## 5 Genus       104 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## 6 Species     651 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## # ... with 2 more variables: Tax <list>, OTUCount <list>
# Show which species level OTUs are contained in each agglomerated group:
psDf1b %>%
.[1:5,] %>%
mutate(TaxIdx = map(Tax, function(df){
    df %>%
    mutate(tag = as.character(tag), oldGroups = as.character(oldGroups)) %>%
    dplyr::select(tag, oldGroups) %>%
    mutate(oldGroups = strsplit(oldGroups, ",")) %>%
    unnest(oldGroups)
})) %>%
dplyr::select(taxLevels, TaxIdx) %>%
unnest(TaxIdx) %>%
mutate(oldGroups = trimws(oldGroups)) %>% # Some of these have leading or trailing whitespace
spread(taxLevels, tag) %>%
dplyr::select(oldGroups, Phylum, Class, Order, Family, Genus) %>%
pass -> taxGroupMapping
write_csv(taxGroupMapping, 'tables/taxGroupMapping.csv')
# Print out each otu table (relative abundances).
walk2(psDf1b$taxLevels, psDf1b$OTU, 
      ~write.csv(.y, file = paste0("tables/OTU/otu_",.x, ".csv")))
# Print out each otu table (counts).
walk2(psDf1b$taxLevels, psDf1b$OTUCount, 
      ~write.csv(.y, file = paste0("tables/OTU/otuCount_",.x, ".csv")))
# Print out each taxonomy table.
walk2(psDf1b$taxLevels, psDf1b$Tax, 
      ~write_csv(.y, path = paste0("tables/Tax/tax_",.x, ".csv")))

Response to reviewers: Richness

The reviewer asks if it associates with MDS1. I’ll check for associations with immunogenicity as well.

Calculate the alpha diversity values for psN1

bN1 <- breakaway(psN1)

Initial look at alpha diveristy values.

plot(bN1)

Large confidence intervals. Also, probably best to examine in log space going forward.

Range of breakaway estimates

summary(bN1)$estimate %>% range
## [1] 151.0282 441.4724

Richness vs MDS1

btN_MDS <- betta(summary(bN1)$estimate,
              summary(bN1)$error,
              make_design_matrix(psN1, "MDS1")
)
btN_MDS
## $table
##             Estimates Standard Errors p-values
## (Intercept)  243.9720        16.82711    0.000
## predictors   -31.7198        27.31450    0.246
## 
## $cov
##             (Intercept) predictors
## (Intercept)    287.4724    57.2095
## predictors      57.2095   757.4669
## 
## $ssq_u
## [1] 1932.271
## 
## $homogeneity
## [1] 6.250059e+01 1.546382e-06
## 
## $global
## [1] 215.6907   0.0000
## 
## $blups
##  [1] 190.9549 235.6741 241.4954 196.3625 222.4246 270.5355 262.2978
##  [8] 280.2053 194.4011 282.1363 231.4248 227.0622 253.5892 260.3696
## [15] 266.7135 248.6283 296.0946 232.9051 201.2419 264.0743 264.8205

Not in any way that is statistically significant (p = 0.246)

Some pre-computing

richEsts <- bN1 %>% map_dbl("estimate")

richCI <- bN1 %>% map_df("interval") %>% t %>% as.data.frame() %>% rename(richLb = V1, richUb = V2) %>% merge(as.data.frame(richEsts), by = "row.names") %>% transform(row.names = Row.names, richEst = richEsts, Row.names = NULL, richEsts = NULL)
# need to add mds1 to this, or just tack this all onto psN1rich

psN1rich <- psN1
sample_data(psN1rich) <- merge(psN1@sam_data, richCI, by = "row.names") %>% transform(row.names = Row.names, Row.names = NULL)

Plot richness vs mds1

psN1rich %>% sample_data %>% as.data.frame %>%
  ggplot(aes(x = MDS1, y = richEst)) + geom_point() #+ geom_errorbar()

As above but with error bars.

psN1richP <- psN1rich %>% sample_data %>% as.data.frame %>%
  ggplot(aes(x = MDS1, y = richEst, ymin = richLb, ymax = richUb)) + geom_point() + geom_errorbar() + scale_y_log10()
psN1richP

Unifrac Distance vs Richness

The reviewer actually asked whether unifrac distance associates with alpha diversity. I’ve done that with MDS1, but not distance per se. Lets do one kernel regression, where we ask whether unifrac distance associates with richness.

Kernel regression, weighted unifrac vs richness

mirRich <- MiRKAT(y = richEsts, Ks = wufKN2, out_type = "C", method = 'permutation', nperm = jnperm)
mirRich
## [1] 0.22179

Very similar p-value to running betta against MDS1.

PCoA figure that compares MDS1 and MDS2 to richness

psN1richMDSP<- psN1rich %>%
sample_data() %>%
ggplot(aes(x = MDS1, y = MDS2)) + geom_point(aes(fill = richEst), size = 5, stroke = 1, shape = 21) +
viridis::scale_fill_viridis(name = 'richEst', direction = 1, option = "viridis") +
  coord_fixed(sqrt(psN2.pcoa$CA$eig[2]/psN2.pcoa$CA$eig[1])) +
#scale_colour_manual(name = 'gp41 Primary', values = c('black', 'grey70')) + 
cowplot::theme_cowplot()
psN1richMDSP

Two text examples, where we wil see if we can relate richness to a parameter

Here is a discrete example originally I used Gp120 month 6.5. Switching to gp41 month 6.5, since subsequent analyis suggested that it does show a relationship, and so makes a more useful exemplar

gpLocmtx <- cbind(1,
  psN1 %>% sample_data %>% as.matrix %>% as.data.frame %>% pull(IgG_gp41_Month_6.5) %>% as.character %>% as.numeric %>% medcode
)
colnames(gpLocmtx) = c("(Intercept)", "predictors")
btNLoc <- betta(summary(bN1)$estimate,
              summary(bN1)$error,
              gpLocmtx
)
btNLoc
## $table
##             Estimates Standard Errors p-values
## (Intercept) 209.10249        13.30004    0.000
## predictors   76.38459        22.30439    0.001
## 
## $cov
##             (Intercept) predictors
## (Intercept)    274.4924  -274.4924
## predictors    -274.4924   771.9782
## 
## $ssq_u
## [1] 780.9728
## 
## $homogeneity
## [1] 24.0326989  0.1949011
## 
## $global
## [1] 323.1194   0.0000
## 
## $blups
##  [1] 184.0879 256.1046 284.6409 195.8995 211.0920 290.4048 217.0720
##  [8] 288.8596 204.3579 291.4781 213.4344 210.8080 231.2472 222.1869
## [15] 291.5582 286.9768 296.3785 285.9490 200.8389 280.4231 287.5847

Compare richness to each immunogenicity parameter.

Step 1, make a data frame with BN1, but pub_id for each sample

Pre calculations

SampleNameToPubId <- psN1 %>% sample_data %>% as.data.frame %>% rownames_to_column() %>% as.tibble %>% dplyr::select(rowname, pub_id) %>% na.omit()
## Warning in class(x) <- c(subclass, "tbl_df", "tbl", "data.frame"): Setting
## class(x) to multiple strings ("tbl_df", "tbl", ...); result will no longer
## be an S4 object

A function that takes immunogenicity data, x (which we will pass in with tidyverse mapping functions), ad, the breakaway richness summary, and a transformation, defaults to medcode2 of the immunogenicity data.

immuneValpha <- function(x, ad = bN1, transformation = medcode2){
  #x <- arrange(x, rowname)
  x2 <- right_join(x, SampleNameToPubId, by = 'pub_id')
  x3 <- x2[is.finite(x2$mag),]
  
  ad2 <- ad[is.finite(x2$mag)]
  class(ad2) <- c("alpha_estimates", "list")
  
  gpXmtx = cbind(1, transformation(x3$mag))
  #gpXmtx
  
   thing <- betta(summary(ad2)$estimate,
               summary(ad2)$error,
               gpXmtx
 )
   out <- as.data.frame(t(thing$table[2,]))
   #colnames(out) = "pval"
   
   # add R^2 value, from simple pearson correlation
   locCor <- cor(summary(ad2)$estimate, gpXmtx[,2], method = "pearson")
   r2 <- locCor^2
   out2 <- data.frame(out, r2)
   
   out2
}

# test a single use case
immuneValpha(use.immune %>% filter(visitno == 9, type == "IgG", antigen == "Con.6.gp120.B"), bN1, transformation = medcode)
##   Estimates Standard.Errors p.values          r2
## 1  21.03933        22.41648    0.348 0.002631791

Actually run the analysis, raw table

immuneAlphaCompiled <- use.immune %>% 
  group_by(type, antigen, month) %>%
  do(immuneValpha(.)) %>%
  ungroup # if you forget to ungroup, pvalue calculations don't work correctly
immuneAlphaTable <- immuneAlphaCompiled %>% mutate(qval = p.adjust(`p.values`, method = "BH"))
immuneAlphaTable
## # A tibble: 20 x 8
##    type  antigen   month Estimates Standard.Errors p.values      r2   qval
##    <chr> <chr>     <dbl>     <dbl>           <dbl>    <dbl>   <dbl>  <dbl>
##  1 CD4+  ANY.ENV.…   6.5     40.7             19.9    0.041 3.26e-2 0.0911
##  2 CD4+  ANY.ENV.…  12       44.8             17.9    0.013 5.57e-3 0.0371
##  3 IgA   gp41        0       -1.84            27.3    0.946 3.22e-2 0.946 
##  4 IgA   gp41        6.5     16.3             26.2    0.534 1.45e-1 0.628 
##  5 IgA   gp41       12       68.4             28.7    0.017 2.11e-1 0.0425
##  6 IgA   p24         0        5.46            24.9    0.827 6.81e-3 0.871 
##  7 IgA   p24         6.5     56.6             21.9    0.01  1.60e-1 0.0333
##  8 IgA   p24        12       32.9             23.8    0.168 1.37e-2 0.28  
##  9 IgG   Con.6.gp…   6.5     21.0             22.4    0.348 2.63e-3 0.535 
## 10 IgG   Con.6.gp…  12       17.5             22.7    0.44  1.33e-3 0.587 
## 11 IgG   gp41        0       17.1             25.6    0.503 7.75e-4 0.628 
## 12 IgG   gp41        6.5     76.4             22.3    0.001 2.47e-1 0.01  
## 13 IgG   gp41       12       61.0             23.4    0.009 9.44e-2 0.0333
## 14 IgG   gp70_B.C…   6.5     -5.83            24.6    0.812 6.38e-3 0.871 
## 15 IgG   gp70_B.C…  12       70.5             14.7    0     6.21e-2 0     
## 16 IgG   p24         0      -30.8             20.3    0.129 3.33e-2 0.235 
## 17 IgG   p24         6.5     60.6             23.5    0.01  1.17e-1 0.0333
## 18 IgG   p24        12       47.3             26.4    0.074 2.34e-1 0.148 
## 19 IgG   ZM96.gp1…   6.5     21.9             25.3    0.387 3.55e-2 0.553 
## 20 IgG   ZM96.gp1…  12       66.8             22.0    0.002 9.50e-2 0.0133

Pretty up the table above for publication

conciseImmuneAlphaTable <- immuneAlphaTable %>% 
  dplyr::select(Type = type, Antigen = antigen, Month = month, Coef=Estimates, R2 = r2, P=`p.values`, Q = qval)
toAlphaTable <- conciseImmuneAlphaTable %>%
  mutate(
    Coef = cell_spec(format(Coef, digits = 3), "html",
                     bold = ifelse(P < 0.05, T, F),
                     italic = ifelse(P < 0.05 & Coef < 0, T, F),
                     background = ifelse(P < 0.05, ifelse(Coef < 0, "lightsalmon", "lightblue"), "")),
    R2 = cell_spec(round(R2, digits = 3), "html"),
    P = cell_spec(format_round(P, 3), "html",
                         bold = ifelse(P < 0.05, T, F),
                         background = ifelse(P < 0.05, 'yellow', '')
    ),
    Q = cell_spec(format_round(Q, 3), "html",
                         bold = ifelse(Q < 0.2, T, F),
                         background = ifelse(Q < 0.2, 'lightyellow', '')
    )
  ) %>%
   mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
  mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen))

toAlphaTable %>% kable("html", escape = F, digits = 3, align = 'c') %>%
  kable_styling("striped", "hover", full_width = F)  %>%
  collapse_rows(columns = 1:2, latex_hline = "full") %>%
  pass-> conciseAlphaTable.html

conciseAlphaTable.html %>% cat(file = 'tables/conciseAlphaTable.html')

conciseAlphaTable.html
Type Antigen Month Coef R2 P Q
CD4+ Any ENV PTEG 6.5 40.69 0.033 0.041 0.091
12.0 44.76 0.006 0.013 0.037
IgA gp41 0.0 -1.84 0.032 0.946 0.946
6.5 16.30 0.145 0.534 0.628
12.0 68.38 0.211 0.017 0.042
p24 0.0 5.46 0.007 0.827 0.871
6.5 56.58 0.16 0.010 0.033
12.0 32.85 0.014 0.168 0.280
IgG Con.6.gp120.B 6.5 21.04 0.003 0.348 0.535
12.0 17.55 0.001 0.440 0.587
gp41 0.0 17.12 0.001 0.503 0.628
6.5 76.38 0.247 0.001 0.010
12.0 61.02 0.094 0.009 0.033
gp70 B.CaseA V1-V2 6.5 -5.83 0.006 0.812 0.871
12.0 70.51 0.062 0.000 0.000
p24 0.0 -30.85 0.033 0.129 0.235
6.5 60.60 0.117 0.010 0.033
12.0 47.26 0.234 0.074 0.148
ZM96.gp140 6.5 21.89 0.036 0.387 0.553
12.0 66.78 0.095 0.002 0.013

As above but with continuous immunogenicity data.

immuneAlphaCompiledGaus <- use.immune %>% 
  group_by(type, antigen, month) %>%
  do(immuneValpha(., transformation = jac_box_cox_2)) %>% 
  ungroup
immuneAlphaTableGaus <- immuneAlphaCompiledGaus %>% mutate(qval = p.adjust(`p.values`, method = "BH"))
immuneAlphaTableGaus
## # A tibble: 20 x 8
##    type  antigen    month Estimates Standard.Errors p.values      r2  qval
##    <chr> <chr>      <dbl>     <dbl>           <dbl>    <dbl>   <dbl> <dbl>
##  1 CD4+  ANY.ENV.P…   6.5     30.5            12.6     0.016 1.34e-1 0.064
##  2 CD4+  ANY.ENV.P…  12       27.5            16.4     0.094 9.89e-4 0.235
##  3 IgA   gp41         0       -6.51           18.0     0.718 3.43e-2 0.818
##  4 IgA   gp41         6.5      2.15           15.5     0.89  6.04e-2 0.89 
##  5 IgA   gp41        12       32.9            16.4     0.045 8.62e-2 0.15 
##  6 IgA   p24          0       -6.77           18.5     0.714 2.82e-4 0.818
##  7 IgA   p24          6.5     15.9            17.9     0.374 3.31e-2 0.68 
##  8 IgA   p24         12       25.2            20.8     0.226 6.30e-2 0.452
##  9 IgG   Con.6.gp1…   6.5      6.48           17.2     0.706 2.44e-6 0.818
## 10 IgG   Con.6.gp1…  12        5.36           15.9     0.736 7.92e-5 0.818
## 11 IgG   gp41         0       10.1            15.4     0.51  1.19e-4 0.774
## 12 IgG   gp41         6.5     29.7             9.75    0.002 1.76e-1 0.01 
## 13 IgG   gp41        12       31.7             4.85    0     1.66e-1 0    
## 14 IgG   gp70_B.Ca…   6.5      3.24           18.5     0.861 2.06e-3 0.89 
## 15 IgG   gp70_B.Ca…  12       36.1             6.73    0     3.27e-2 0    
## 16 IgG   p24          0      -14.4            17.9     0.421 1.01e-1 0.702
## 17 IgG   p24          6.5     21.7            16.0     0.175 1.44e-1 0.389
## 18 IgG   p24         12       32.0            17.0     0.06  2.08e-1 0.171
## 19 IgG   ZM96.gp140   6.5      8.40           13.8     0.542 4.73e-4 0.774
## 20 IgG   ZM96.gp140  12       32.2             4.71    0     8.80e-2 0
conciseImmuneAlphaTableGaus <- immuneAlphaTableGaus %>% 
  dplyr::select(Type = type, Antigen = antigen, Month = month, Coef=Estimates, R2 = r2, P=`p.values`, Q = qval)
toAlphaTableGaus <- conciseImmuneAlphaTableGaus %>%
  mutate(
    Coef = cell_spec(format(Coef, digits = 3), "html",
                     bold = ifelse(P < 0.05, T, F),
                     italic = ifelse(P < 0.05 & Coef < 0, T, F),
                     background = ifelse(P < 0.05, ifelse(Coef < 0, "lightsalmon", "lightblue"), "")),
    R2 = cell_spec(ifelse(R2 < 0.01, format(R2, digits = 2),round(R2, digits = 2)) , "html"),
    P = cell_spec(format_round(P, 3), "html",
                         bold = ifelse(P < 0.05, T, F),
                         background = ifelse(P < 0.05, 'yellow', '')
    ),
    Q = cell_spec(format_round(Q, 3), "html",
                         bold = ifelse(Q < 0.2, T, F),
                         background = ifelse(Q < 0.2, 'lightyellow', '')
    )
  ) %>%
   mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
  mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen))

toAlphaTableGaus %>% kable("html", escape = F, digits = 3, align = 'c') %>%
  kable_styling("striped", "hover", full_width = F)  %>%
  collapse_rows(columns = 1:2, latex_hline = "full") %>%
  pass-> conciseAlphaTableGaus.html

conciseAlphaTableGaus.html %>% cat(file = 'tables/conciseAlphaTableGaus.html')

conciseAlphaTableGaus.html
Type Antigen Month Coef R2 P Q
CD4+ Any ENV PTEG 6.5 30.50 0.13 0.016 0.064
12.0 27.49 9.9e-04 0.094 0.235
IgA gp41 0.0 -6.51 0.03 0.718 0.818
6.5 2.15 0.06 0.890 0.890
12.0 32.88 0.09 0.045 0.150
p24 0.0 -6.77 2.8e-04 0.714 0.818
6.5 15.87 0.03 0.374 0.680
12.0 25.20 0.06 0.226 0.452
IgG Con.6.gp120.B 6.5 6.48 2.4e-06 0.706 0.818
12.0 5.36 7.9e-05 0.736 0.818
gp41 0.0 10.14 1.2e-04 0.510 0.774
6.5 29.66 0.18 0.002 0.010
12.0 31.71 0.17 0.000 0.000
gp70 B.CaseA V1-V2 6.5 3.24 2.1e-03 0.861 0.890
12.0 36.15 0.03 0.000 0.000
p24 0.0 -14.39 0.1 0.421 0.702
6.5 21.71 0.14 0.175 0.389
12.0 31.96 0.21 0.060 0.171
ZM96.gp140 6.5 8.40 4.7e-04 0.542 0.774
12.0 32.21 0.09 0.000 0.000

Richness vs alpha above, but this time colorcoding by group.

psN1richP_gp41m6 <- psN1rich %>% sample_data %>% as.data.frame %>%
  ggplot(aes(x = MDS1, y = richEst, ymin = richLb, ymax = richUb, fill = as.factor(medcode_hl(IgG_gp41_Month_6.5)))) + geom_point(shape = 21, size = 4) + geom_errorbar() + scale_y_log10() + scale_fill_viridis_d(direction = -1, name = 'gp41 Primary Timepoint') + labs(x = "MDS1", y = "Richness (# ASVs)") + cowplot::theme_cowplot()
psN1richP_gp41m6

Summary figure for of alpha

Combined figure

par <- options()
#options(repr.plot.width=6, repr.plot.height= 11)
#g <- grid.arrange(wuford_gp41, wuford_gp120, ncol = 2)
g <- cowplot::plot_grid(psN1richMDSP, psN1richP_gp41m6, ncol = 1, labels = c("A", "B"), label_size = 24, align = "v")
g

cowplot::save_plot('figures/richnessMDS1Gp41.png', g, base_width = 6, base_height = 8)

Next question. Do the donor and non donor groups ever have different immune responses?

This is in response to a reviewr comment. Strategy, write a function to, for one variable of interest, compare the groups, as I did for unifrac distance. I’ll use a simple logistic regression here. Similar to CapVar. I want a use.immune data set that includes whether people are a donor as a column, but that

immune.data %>%
mutate(isDonor = pub_id %in% muDoners) %>%
filter(
    (type == 'IgG' & 
    antigen %in% ants1 &
    month %in% c(6.5,12)
    ) |
    (type %in% c('IgG', 'IgA') &
     antigen %in% ants2 &
     month %in% c(0,6.5,12)
    ) |
    type == 'CD4+' &
    antigen == 'ANY.ENV.PTEG' &
    month %in% c(6.5, 12)
      )-> allparticipants.immune
head(allparticipants.immune)
## # A tibble: 6 x 14
##   visitno rx_code type  antigen   mag mag_bl response   day month ct   
##     <int> <chr>   <chr> <chr>   <dbl>  <dbl>    <int> <int> <dbl> <chr>
## 1       9 CTRL    IgA   gp41     177.   214.        0   182   6.5 CTRL 
## 2      12 CTRL    IgA   gp41     173    214.        0   364  12   CTRL 
## 3       9 CTRL    IgA   p24      966    454.        0   182   6.5 CTRL 
## 4      12 CTRL    IgA   p24     1419    454.        0   364  12   CTRL 
## 5       9 CTRL    IgG   Con.6.…    1      1         0   182   6.5 CTRL 
## 6      12 CTRL    IgG   Con.6.…    1      1         0   364  12   CTRL 
## # ... with 4 more variables: response_j <dbl>, assay <chr>, pub_id <dbl>,
## #   isDonor <lgl>
allparticipants.test <- allparticipants.immune %>% filter(visitno ==9, antigen == "Con.6.gp120.B",type == "IgG")
head(allparticipants.test)
## # A tibble: 6 x 14
##   visitno rx_code type  antigen    mag mag_bl response   day month ct   
##     <int> <chr>   <chr> <chr>    <dbl>  <dbl>    <int> <int> <dbl> <chr>
## 1       9 CTRL    IgG   Con.6.…     1    1           0   182   6.5 CTRL 
## 2       9 T2      IgG   Con.6.… 14938.   1           1   182   6.5 T    
## 3       9 T1      IgG   Con.6.… 30458.   1           1   182   6.5 T    
## 4       9 T4      IgG   Con.6.… 29971.   1           1   182   6.5 T    
## 5       9 T3      IgG   Con.6.… 31292.   7.75        1   182   6.5 T    
## 6       9 T4      IgG   Con.6.… 17209  362.          1   182   6.5 T    
## # ... with 4 more variables: response_j <dbl>, assay <chr>, pub_id <dbl>,
## #   isDonor <lgl>
donorTest <- function(x, transformation = medcode2, family = 'binomial'){
  loc_glm <- glm(transformation(mag) ~  isDonor, data = x, family = family)
  loc_glm %>% broom::tidy() %>% filter(term == 'isDonorTRUE') %>% dplyr::select(-term)
}
donorTest(allparticipants.test)
## # A tibble: 1 x 4
##   estimate std.error statistic p.value
##      <dbl>     <dbl>     <dbl>   <dbl>
## 1    0.380     0.506     0.751   0.453

Do on everything

allparticipants.immune %>%
  filter(ct == 'T') %>%
  group_by(type, antigen, visitno) %>%
  do(data.frame(donorTest(.))) %>%
  pass-> DonorEffectTable
DonorEffectTable
## # A tibble: 20 x 7
## # Groups:   type, antigen, visitno [20]
##    type  antigen            visitno  estimate std.error statistic p.value
##    <chr> <chr>                <int>     <dbl>     <dbl>     <dbl>   <dbl>
##  1 CD4+  ANY.ENV.PTEG             9  9.53e- 2     0.520  1.83e- 1   0.855
##  2 CD4+  ANY.ENV.PTEG            12 -1.17e-16     0.526 -2.22e-16   1.000
##  3 IgA   gp41                     2 -3.89e- 1     0.512 -7.60e- 1   0.447
##  4 IgA   gp41                     9 -1.72e- 1     0.518 -3.33e- 1   0.739
##  5 IgA   gp41                    12 -2.75e- 1     0.526 -5.23e- 1   0.601
##  6 IgA   p24                      2  1.29e- 1     0.509  2.54e- 1   0.799
##  7 IgA   p24                      9  3.65e- 1     0.521  7.00e- 1   0.484
##  8 IgA   p24                     12 -2.32e-16     0.524 -4.44e-16   1.000
##  9 IgG   Con.6.gp120.B            9  4.05e- 1     0.523  7.76e- 1   0.438
## 10 IgG   Con.6.gp120.B           12  1.33e- 1     0.516  2.58e- 1   0.797
## 11 IgG   gp41                     2  3.57e- 1     0.513  6.95e- 1   0.487
## 12 IgG   gp41                     9 -1.35e- 1     0.519 -2.59e- 1   0.795
## 13 IgG   gp41                    12  1.33e- 1     0.516  2.58e- 1   0.797
## 14 IgG   gp70_B.CaseA_V1_V2       9  4.05e- 1     0.523  7.76e- 1   0.438
## 15 IgG   gp70_B.CaseA_V1_V2      12 -1.33e- 1     0.516 -2.58e- 1   0.797
## 16 IgG   p24                      2 -1.64e- 1     0.510 -3.22e- 1   0.747
## 17 IgG   p24                      9 -3.92e- 2     0.528 -7.43e- 2   0.941
## 18 IgG   p24                     12 -4.01e- 1     0.520 -7.72e- 1   0.440
## 19 IgG   ZM96.gp140               9  9.53e- 2     0.520  1.83e- 1   0.855
## 20 IgG   ZM96.gp140              12 -4.42e- 1     0.521 -8.47e- 1   0.397

Ok. There appears to be no significant difference in magnitude group for any category.

allparticipants.immune %>%
  filter(ct == 'T') %>%
  group_by(type, antigen, visitno) %>%
  do(data.frame(donorTest(.,  transformation = function(x){jac_box_cox_2(x)},
                     family = 'gaussian'))) %>%
  pass-> DonorEffectGaus

Monkeys, I guess its debug aclock.

DonorEffectGaus
## # A tibble: 20 x 7
## # Groups:   type, antigen, visitno [20]
##    type  antigen            visitno estimate std.error statistic p.value
##    <chr> <chr>                <int>    <dbl>     <dbl>     <dbl>   <dbl>
##  1 CD4+  ANY.ENV.PTEG             9  0.441       0.256    1.72    0.0897
##  2 CD4+  ANY.ENV.PTEG            12  0.0673      0.265    0.254   0.800 
##  3 IgA   gp41                     2 -0.105       0.255   -0.413   0.681 
##  4 IgA   gp41                     9  0.236       0.259    0.911   0.366 
##  5 IgA   gp41                    12  0.349       0.260    1.34    0.184 
##  6 IgA   p24                      2  0.392       0.252    1.56    0.124 
##  7 IgA   p24                      9  0.274       0.258    1.06    0.293 
##  8 IgA   p24                     12  0.313       0.261    1.20    0.234 
##  9 IgG   Con.6.gp120.B            9 -0.0707      0.261   -0.271   0.787 
## 10 IgG   Con.6.gp120.B           12  0.0298      0.260    0.115   0.909 
## 11 IgG   gp41                     2  0.268       0.254    1.05    0.295 
## 12 IgG   gp41                     9 -0.113       0.261   -0.435   0.665 
## 13 IgG   gp41                    12 -0.0472      0.260   -0.182   0.856 
## 14 IgG   gp70_B.CaseA_V1_V2       9  0.185       0.260    0.710   0.480 
## 15 IgG   gp70_B.CaseA_V1_V2      12 -0.207       0.258   -0.799   0.427 
## 16 IgG   p24                      2  0.114       0.256    0.447   0.656 
## 17 IgG   p24                      9  0.242       0.264    0.916   0.363 
## 18 IgG   p24                     12  0.00988     0.260    0.0380  0.970 
## 19 IgG   ZM96.gp140               9  0.0605      0.262    0.231   0.818 
## 20 IgG   ZM96.gp140              12 -0.115       0.260   -0.442   0.660

Same deal when I do a normal logistic regression.

#save.image(file = "workspace.Rdata")